2023-01-11T21:02:11.5971453Z Requested labels: linux.8xlarge.nvidia.gpu 2023-01-11T21:02:11.5971578Z Job defined at: pytorch/pytorch/.github/workflows/_linux-test.yml@refs/pull/91627/merge 2023-01-11T21:02:11.5971690Z Reusable workflow chain: 2023-01-11T21:02:11.5971734Z pytorch/pytorch/.github/workflows/pull.yml@refs/pull/91627/merge (57fc38f02f250896a12b32cfa200a6105a03d09c) 2023-01-11T21:02:11.5971777Z -> pytorch/pytorch/.github/workflows/_linux-test.yml@refs/pull/91627/merge (57fc38f02f250896a12b32cfa200a6105a03d09c) 2023-01-11T21:02:11.5971800Z Waiting for a runner to pick up this job... 2023-01-11T21:02:11.8936853Z Job is about to start running on the runner: i-0f0fe094d8805bec6 (organization) 2023-01-11T21:02:18.4590752Z Current runner version: '2.300.2' 2023-01-11T21:02:18.4598155Z Runner name: 'i-0f0fe094d8805bec6' 2023-01-11T21:02:18.4598969Z Runner group name: 'Default' 2023-01-11T21:02:18.4599745Z Machine name: 'ip-10-0-0-157' 2023-01-11T21:02:18.4602427Z ##[group]GITHUB_TOKEN Permissions 2023-01-11T21:02:18.4603326Z Actions: read 2023-01-11T21:02:18.4603762Z Checks: read 2023-01-11T21:02:18.4604129Z Contents: read 2023-01-11T21:02:18.4604564Z Deployments: read 2023-01-11T21:02:18.4605056Z Discussions: read 2023-01-11T21:02:18.4605425Z Issues: read 2023-01-11T21:02:18.4605836Z Metadata: read 2023-01-11T21:02:18.4606271Z Packages: read 2023-01-11T21:02:18.4606639Z Pages: read 2023-01-11T21:02:18.4607125Z PullRequests: read 2023-01-11T21:02:18.4607606Z RepositoryProjects: read 2023-01-11T21:02:18.4608035Z SecurityEvents: read 2023-01-11T21:02:18.4608468Z Statuses: read 2023-01-11T21:02:18.4608943Z ##[endgroup] 2023-01-11T21:02:18.4613039Z Secret source: None 2023-01-11T21:02:18.4613908Z Prepare workflow directory 2023-01-11T21:02:18.5915619Z Prepare all required actions 2023-01-11T21:02:18.6142392Z Getting action download info 2023-01-11T21:02:18.8440804Z Download action repository 'pytorch/test-infra@main' (SHA:2c225610d00fb13c04fcd60389d3e4d8326167c3) 2023-01-11T21:02:20.3973157Z Download action repository 'pytorch/pytorch@master' (SHA:c5836153f5332ca83d5cacde38f2829a4d54793e) 2023-01-11T21:02:24.4074369Z Download action repository 'seemethere/upload-artifact-s3@v5' (SHA:baba72d0712b404f646cebe0730933554ebce96a) 2023-01-11T21:02:24.7494328Z Getting action download info 2023-01-11T21:02:24.9908216Z Download action repository 'malfet/checkout@silent-checkout' (SHA:c7b8fef48edfe1bca0044a44b1f7f7c4318a3076) 2023-01-11T21:02:25.1820082Z Getting action download info 2023-01-11T21:02:25.4536862Z Download action repository 'nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482' (SHA:3e91a01664abd3c5cd539100d10d33b9c5b68482) 2023-01-11T21:02:25.6158602Z Uses: pytorch/pytorch/.github/workflows/_linux-test.yml 2023-01-11T21:02:25.6161063Z ##[group] Inputs 2023-01-11T21:02:25.6161462Z build-environment: linux-bionic-cuda11.6-py3.10-gcc7 2023-01-11T21:02:25.6162709Z test-matrix: { include: [ { config: "default", shard: 1, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, { config: "default", shard: 2, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, { config: "default", shard: 3, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, { config: "default", shard: 4, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, { config: "distributed", shard: 1, num_shards: 3, runner: "linux.8xlarge.nvidia.gpu" }, { config: "distributed", shard: 2, num_shards: 3, runner: "linux.8xlarge.nvidia.gpu" }, { config: "distributed", shard: 3, num_shards: 3, runner: "linux.8xlarge.nvidia.gpu" }, { config: "functorch", shard: 1, num_shards: 1, runner: "linux.4xlarge.nvidia.gpu" }, { config: "deploy", shard: 1, num_shards: 1, runner: "linux.4xlarge.nvidia.gpu" }, ]} 2023-01-11T21:02:25.6164060Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd 2023-01-11T21:02:25.6164547Z sync-tag: 2023-01-11T21:02:25.6165612Z timeout-minutes: 240 2023-01-11T21:02:25.6165888Z use-gha: 2023-01-11T21:02:25.6166154Z ##[endgroup] 2023-01-11T21:02:25.6166975Z Complete job name: linux-bionic-cuda11.6-py3.10-gcc7 / test (distributed, 3, 3, linux.8xlarge.nvidia.gpu) 2023-01-11T21:02:25.7245247Z ##[group]Run pytorch/test-infra/.github/actions/setup-ssh@main 2023-01-11T21:02:25.7245639Z with: 2023-01-11T21:02:25.7246201Z github-secret: *** 2023-01-11T21:02:25.7246673Z instructions: All testing is done inside the container, to start an interactive session run: docker exec -it $(docker container ps --format '{{.ID}}') bash 2023-01-11T21:02:25.7247126Z activate-with-label: false 2023-01-11T21:02:25.7247396Z label: with-ssh 2023-01-11T21:02:25.7247642Z remove-existing-keys: true 2023-01-11T21:02:25.7247899Z env: 2023-01-11T21:02:25.7248139Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:02:25.7248565Z ##[endgroup] 2023-01-11T21:02:26.4129048Z Grabbing public ssh keys from https://github.com/LucaLumetti.keys 2023-01-11T21:02:26.5041085Z ~/.ssh/authorized_keys file found on node, removing ~/.ssh and starting fresh 2023-01-11T21:02:26.5063054Z Public keys pulled and installed to /home/ec2-user/.ssh/authorized_keys 2023-01-11T21:02:26.5107903Z Login using: ssh ec2-user@ec2-3-80-197-209.compute-1.amazonaws.com 2023-01-11T21:02:26.5108378Z All testing is done inside the container, to start an interactive session run: 2023-01-11T21:02:26.5108852Z docker exec -it $(docker container ps --format '{{.ID}}') bash 2023-01-11T21:02:26.5394888Z ##[group]Run pytorch/pytorch/.github/actions/checkout-pytorch@master 2023-01-11T21:02:26.5395261Z with: 2023-01-11T21:02:26.5395496Z submodules: recursive 2023-01-11T21:02:26.5395731Z fetch-depth: 0 2023-01-11T21:02:26.5395958Z env: 2023-01-11T21:02:26.5396191Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:02:26.5396438Z ##[endgroup] 2023-01-11T21:02:26.5683167Z ##[group]Run retry () { 2023-01-11T21:02:26.5683494Z retry () { 2023-01-11T21:02:26.5683784Z  $* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*) 2023-01-11T21:02:26.5684072Z } 2023-01-11T21:02:26.5684327Z echo "${GITHUB_WORKSPACE}" 2023-01-11T21:02:26.5684607Z if [ -z "${NO_SUDO}" ]; then 2023-01-11T21:02:26.5684914Z  retry sudo rm -rf "${GITHUB_WORKSPACE}" 2023-01-11T21:02:26.5685191Z else 2023-01-11T21:02:26.5685460Z  retry rm -rf "${GITHUB_WORKSPACE}" 2023-01-11T21:02:26.5685707Z fi 2023-01-11T21:02:26.5685995Z mkdir "${GITHUB_WORKSPACE}" 2023-01-11T21:02:26.5704446Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T21:02:26.5704751Z env: 2023-01-11T21:02:26.5705000Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:02:26.5705256Z NO_SUDO: 2023-01-11T21:02:26.5705476Z ##[endgroup] 2023-01-11T21:02:26.5829998Z /home/ec2-user/actions-runner/_work/pytorch/pytorch 2023-01-11T21:02:26.6166562Z ##[group]Run malfet/checkout@silent-checkout 2023-01-11T21:02:26.6166837Z with: 2023-01-11T21:02:26.6167111Z ref: 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:02:26.6167463Z fetch-depth: 0 2023-01-11T21:02:26.6167846Z submodules: recursive 2023-01-11T21:02:26.6168111Z quiet-checkout: true 2023-01-11T21:02:26.6168382Z repository: pytorch/pytorch 2023-01-11T21:02:26.6168832Z token: *** 2023-01-11T21:02:26.6169058Z ssh-strict: true 2023-01-11T21:02:26.6169325Z persist-credentials: true 2023-01-11T21:02:26.6169582Z clean: true 2023-01-11T21:02:26.6169796Z lfs: false 2023-01-11T21:02:26.6170045Z set-safe-directory: true 2023-01-11T21:02:26.6170285Z env: 2023-01-11T21:02:26.6170499Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:02:26.6170750Z ##[endgroup] 2023-01-11T21:02:26.7654793Z Syncing repository: pytorch/pytorch 2023-01-11T21:02:26.7656967Z ##[group]Getting Git version info 2023-01-11T21:02:26.7657539Z Working directory is '/home/ec2-user/actions-runner/_work/pytorch/pytorch' 2023-01-11T21:02:26.7658126Z [command]/usr/bin/git version 2023-01-11T21:02:26.7658396Z git version 2.38.1 2023-01-11T21:02:26.7662918Z ##[endgroup] 2023-01-11T21:02:26.7682149Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/9acc3f44-bc14-4c50-a41d-e2d044887365' before making global git config changes 2023-01-11T21:02:26.7683363Z Adding repository directory to the temporary git global config as a safe directory 2023-01-11T21:02:26.7689496Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch 2023-01-11T21:02:26.7732841Z Deleting the contents of '/home/ec2-user/actions-runner/_work/pytorch/pytorch' 2023-01-11T21:02:26.7739831Z ##[group]Initializing the repository 2023-01-11T21:02:26.7743593Z [command]/usr/bin/git init /home/ec2-user/actions-runner/_work/pytorch/pytorch 2023-01-11T21:02:26.7773498Z hint: Using 'master' as the name for the initial branch. This default branch name 2023-01-11T21:02:26.7774152Z hint: is subject to change. To configure the initial branch name to use in all 2023-01-11T21:02:26.7774572Z hint: of your new repositories, which will suppress this warning, call: 2023-01-11T21:02:26.7774883Z hint: 2023-01-11T21:02:26.7775242Z hint: git config --global init.defaultBranch 2023-01-11T21:02:26.7775532Z hint: 2023-01-11T21:02:26.7776021Z hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and 2023-01-11T21:02:26.7776789Z hint: 'development'. The just-created branch can be renamed via this command: 2023-01-11T21:02:26.7777235Z hint: 2023-01-11T21:02:26.7777808Z hint: git branch -m 2023-01-11T21:02:26.7778355Z Initialized empty Git repository in /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/ 2023-01-11T21:02:26.7788006Z [command]/usr/bin/git remote add origin https://github.com/pytorch/pytorch 2023-01-11T21:02:26.7822171Z ##[endgroup] 2023-01-11T21:02:26.7822806Z ##[group]Disabling automatic garbage collection 2023-01-11T21:02:26.7826991Z [command]/usr/bin/git config --local gc.auto 0 2023-01-11T21:02:26.7856858Z ##[endgroup] 2023-01-11T21:02:26.7858005Z ##[group]Setting up auth 2023-01-11T21:02:26.7867174Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2023-01-11T21:02:26.7900541Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || : 2023-01-11T21:02:26.8224064Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2023-01-11T21:02:26.8254585Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || : 2023-01-11T21:02:26.8550351Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2023-01-11T21:02:26.8594836Z ##[endgroup] 2023-01-11T21:02:26.8595414Z ##[group]Fetching the repository 2023-01-11T21:02:26.8603862Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --quiet --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/* 2023-01-11T21:03:23.0245553Z [command]/usr/bin/git rev-parse --verify --quiet 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e^{object} 2023-01-11T21:03:23.0272147Z 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:03:23.0278199Z ##[endgroup] 2023-01-11T21:03:23.0278698Z ##[group]Determining the checkout info 2023-01-11T21:03:23.0279839Z ##[endgroup] 2023-01-11T21:03:23.0280306Z ##[group]Checking out the ref 2023-01-11T21:03:23.0285248Z [command]/usr/bin/git checkout --quiet --force 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:03:24.7508837Z ##[endgroup] 2023-01-11T21:03:24.7509549Z ##[group]Setting up auth for fetching submodules 2023-01-11T21:03:24.7515362Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2023-01-11T21:03:24.7568513Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2023-01-11T21:03:24.7599972Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2023-01-11T21:03:24.7631580Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2023-01-11T21:03:24.7661430Z ##[endgroup] 2023-01-11T21:03:24.7661876Z ##[group]Fetching submodules 2023-01-11T21:03:24.7666859Z [command]/usr/bin/git submodule sync --recursive 2023-01-11T21:03:24.7986862Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive 2023-01-11T21:03:24.8287675Z Submodule 'android/libs/fbjni' (https://github.com/facebookincubator/fbjni.git) registered for path 'android/libs/fbjni' 2023-01-11T21:03:24.8290309Z Submodule 'third_party/NNPACK_deps/FP16' (https://github.com/Maratyszcza/FP16.git) registered for path 'third_party/FP16' 2023-01-11T21:03:24.8293214Z Submodule 'third_party/NNPACK_deps/FXdiv' (https://github.com/Maratyszcza/FXdiv.git) registered for path 'third_party/FXdiv' 2023-01-11T21:03:24.8296681Z Submodule 'third_party/NNPACK' (https://github.com/Maratyszcza/NNPACK.git) registered for path 'third_party/NNPACK' 2023-01-11T21:03:24.8300751Z Submodule 'third_party/QNNPACK' (https://github.com/pytorch/QNNPACK) registered for path 'third_party/QNNPACK' 2023-01-11T21:03:24.8304406Z Submodule 'third_party/VulkanMemoryAllocator' (https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator.git) registered for path 'third_party/VulkanMemoryAllocator' 2023-01-11T21:03:24.8307978Z Submodule 'third_party/XNNPACK' (https://github.com/google/XNNPACK.git) registered for path 'third_party/XNNPACK' 2023-01-11T21:03:24.8311782Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/benchmark' 2023-01-11T21:03:24.8315609Z Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo.git) registered for path 'third_party/cpuinfo' 2023-01-11T21:03:24.8319681Z Submodule 'third_party/cub' (https://github.com/NVlabs/cub.git) registered for path 'third_party/cub' 2023-01-11T21:03:24.8323959Z Submodule 'third_party/cudnn_frontend' (https://github.com/NVIDIA/cudnn-frontend.git) registered for path 'third_party/cudnn_frontend' 2023-01-11T21:03:24.8328190Z Submodule 'third_party/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/cutlass' 2023-01-11T21:03:24.8332806Z Submodule 'third_party/eigen' (https://gitlab.com/libeigen/eigen.git) registered for path 'third_party/eigen' 2023-01-11T21:03:24.8337586Z Submodule 'third_party/fbgemm' (https://github.com/pytorch/fbgemm) registered for path 'third_party/fbgemm' 2023-01-11T21:03:24.8343574Z Submodule 'third_party/flatbuffers' (https://github.com/google/flatbuffers.git) registered for path 'third_party/flatbuffers' 2023-01-11T21:03:24.8348301Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/fmt' 2023-01-11T21:03:24.8353333Z Submodule 'third_party/foxi' (https://github.com/houseroad/foxi.git) registered for path 'third_party/foxi' 2023-01-11T21:03:24.8358575Z Submodule 'third_party/gemmlowp/gemmlowp' (https://github.com/google/gemmlowp.git) registered for path 'third_party/gemmlowp/gemmlowp' 2023-01-11T21:03:24.8363684Z Submodule 'third_party/gloo' (https://github.com/facebookincubator/gloo) registered for path 'third_party/gloo' 2023-01-11T21:03:24.8369272Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/googletest' 2023-01-11T21:03:24.8374973Z Submodule 'third_party/ideep' (https://github.com/intel/ideep) registered for path 'third_party/ideep' 2023-01-11T21:03:24.8381301Z Submodule 'third_party/ios-cmake' (https://github.com/Yangqing/ios-cmake.git) registered for path 'third_party/ios-cmake' 2023-01-11T21:03:24.8387079Z Submodule 'third_party/ittapi' (https://github.com/intel/ittapi.git) registered for path 'third_party/ittapi' 2023-01-11T21:03:24.8392917Z Submodule 'third_party/kineto' (https://github.com/pytorch/kineto) registered for path 'third_party/kineto' 2023-01-11T21:03:24.8398938Z Submodule 'third_party/nccl/nccl' (https://github.com/NVIDIA/nccl) registered for path 'third_party/nccl/nccl' 2023-01-11T21:03:24.8405213Z Submodule 'third_party/neon2sse' (https://github.com/intel/ARM_NEON_2_x86_SSE.git) registered for path 'third_party/neon2sse' 2023-01-11T21:03:24.8411523Z Submodule 'third_party/nlohmann' (https://github.com/nlohmann/json.git) registered for path 'third_party/nlohmann' 2023-01-11T21:03:24.8417949Z Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx' 2023-01-11T21:03:24.8425049Z Submodule 'third_party/onnx-tensorrt' (https://github.com/onnx/onnx-tensorrt) registered for path 'third_party/onnx-tensorrt' 2023-01-11T21:03:24.8431620Z Submodule 'third_party/pocketfft' (https://github.com/mreineck/pocketfft) registered for path 'third_party/pocketfft' 2023-01-11T21:03:24.8438445Z Submodule 'third_party/protobuf' (https://github.com/protocolbuffers/protobuf.git) registered for path 'third_party/protobuf' 2023-01-11T21:03:24.8445403Z Submodule 'third_party/NNPACK_deps/psimd' (https://github.com/Maratyszcza/psimd.git) registered for path 'third_party/psimd' 2023-01-11T21:03:24.8452536Z Submodule 'third_party/NNPACK_deps/pthreadpool' (https://github.com/Maratyszcza/pthreadpool.git) registered for path 'third_party/pthreadpool' 2023-01-11T21:03:24.8460324Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/pybind11' 2023-01-11T21:03:24.8467770Z Submodule 'third_party/python-enum' (https://github.com/PeachPy/enum34.git) registered for path 'third_party/python-enum' 2023-01-11T21:03:24.8475215Z Submodule 'third_party/python-peachpy' (https://github.com/malfet/PeachPy.git) registered for path 'third_party/python-peachpy' 2023-01-11T21:03:24.8482699Z Submodule 'third_party/python-six' (https://github.com/benjaminp/six.git) registered for path 'third_party/python-six' 2023-01-11T21:03:24.8490389Z Submodule 'third_party/sleef' (https://github.com/shibatch/sleef) registered for path 'third_party/sleef' 2023-01-11T21:03:24.8499786Z Submodule 'third_party/tbb' (https://github.com/01org/tbb) registered for path 'third_party/tbb' 2023-01-11T21:03:24.8507968Z Submodule 'third_party/tensorpipe' (https://github.com/pytorch/tensorpipe.git) registered for path 'third_party/tensorpipe' 2023-01-11T21:03:24.8516276Z Submodule 'third_party/zstd' (https://github.com/facebook/zstd.git) registered for path 'third_party/zstd' 2023-01-11T21:03:24.8545063Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/android/libs/fbjni'... 2023-01-11T21:03:25.1482398Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FP16'... 2023-01-11T21:03:25.4240689Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FXdiv'... 2023-01-11T21:03:25.7521467Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NNPACK'... 2023-01-11T21:03:26.1018575Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/QNNPACK'... 2023-01-11T21:03:26.4287357Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/VulkanMemoryAllocator'... 2023-01-11T21:03:28.6184508Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/XNNPACK'... 2023-01-11T21:03:34.6089025Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/benchmark'... 2023-01-11T21:03:35.0697971Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpuinfo'... 2023-01-11T21:03:35.7003094Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cub'... 2023-01-11T21:03:37.3196481Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cudnn_frontend'... 2023-01-11T21:03:38.6684435Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cutlass'... 2023-01-11T21:03:40.0602330Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/eigen'... 2023-01-11T21:03:47.3797469Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm'... 2023-01-11T21:03:48.2140750Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flatbuffers'... 2023-01-11T21:03:49.8502767Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fmt'... 2023-01-11T21:03:51.0271242Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/foxi'... 2023-01-11T21:03:51.2531473Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gemmlowp/gemmlowp'... 2023-01-11T21:03:51.8007173Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gloo'... 2023-01-11T21:03:52.1604516Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/googletest'... 2023-01-11T21:03:53.5033398Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep'... 2023-01-11T21:03:54.2680676Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ios-cmake'... 2023-01-11T21:03:54.5068783Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ittapi'... 2023-01-11T21:03:54.7924203Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto'... 2023-01-11T21:03:56.4132016Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/nccl/nccl'... 2023-01-11T21:03:56.8045446Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/neon2sse'... 2023-01-11T21:03:57.2126755Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/nlohmann'... 2023-01-11T21:04:03.3408770Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx'... 2023-01-11T21:04:05.3335353Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt'... 2023-01-11T21:04:05.8640057Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pocketfft'... 2023-01-11T21:04:06.2437176Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf'... 2023-01-11T21:04:12.5257830Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/psimd'... 2023-01-11T21:04:12.7786368Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pthreadpool'... 2023-01-11T21:04:13.0300228Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pybind11'... 2023-01-11T21:04:14.0043986Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-enum'... 2023-01-11T21:04:14.2487006Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-peachpy'... 2023-01-11T21:04:14.6383460Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-six'... 2023-01-11T21:04:14.9774853Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/sleef'... 2023-01-11T21:04:15.8741195Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tbb'... 2023-01-11T21:04:18.6592524Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe'... 2023-01-11T21:04:19.3703347Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/zstd'... 2023-01-11T21:04:21.7684590Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f' 2023-01-11T21:04:21.7807120Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3' 2023-01-11T21:04:21.7902805Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1' 2023-01-11T21:04:21.8180565Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73' 2023-01-11T21:04:21.8448727Z Submodule path 'third_party/QNNPACK': checked out '7d2a4e9931a82adc3814275b6219a03e24e36b4c' 2023-01-11T21:04:21.8899166Z Submodule path 'third_party/VulkanMemoryAllocator': checked out 'a6bfc237255a6bac1513f7c1ebde6d8aed6b5191' 2023-01-11T21:04:22.6585007Z Submodule path 'third_party/XNNPACK': checked out 'ae108ef49aa5623b896fc93d4298c49d1750d9ba' 2023-01-11T21:04:22.6832785Z Submodule path 'third_party/benchmark': checked out '0d98dba29d66e93259db7daa53a9327df767a415' 2023-01-11T21:04:22.8035937Z Submodule path 'third_party/cpuinfo': checked out '8ec7bd91ad0470e61cf38f618cc1f270dede599c' 2023-01-11T21:04:22.8435685Z Submodule path 'third_party/cub': checked out 'd106ddb991a56c3df1b6d51b2409e36ba8181ce4' 2023-01-11T21:04:23.2036989Z Submodule path 'third_party/cudnn_frontend': checked out '171a7a986f7fbd9ed71bd0cf3c7ad4f55843d6b3' 2023-01-11T21:04:23.7132104Z Submodule path 'third_party/cutlass': checked out 'b72cbf957df8cf84a6d0ff91c190ad51a9c1d24a' 2023-01-11T21:04:24.0072799Z Submodule path 'third_party/eigen': checked out '3147391d946bb4b6c68edd901f2add6ac1f31f8c' 2023-01-11T21:04:24.0627995Z Submodule path 'third_party/fbgemm': checked out '80d64206c07879fd4683be66873de7cefa1a0a71' 2023-01-11T21:04:24.0645784Z Submodule 'third_party/asmjit' (https://github.com/asmjit/asmjit.git) registered for path 'third_party/fbgemm/third_party/asmjit' 2023-01-11T21:04:24.0648881Z Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo) registered for path 'third_party/fbgemm/third_party/cpuinfo' 2023-01-11T21:04:24.0652091Z Submodule 'third_party/googletest' (https://github.com/google/googletest) registered for path 'third_party/fbgemm/third_party/googletest' 2023-01-11T21:04:24.0655633Z Submodule 'third_party/hipify_torch' (https://github.com/ROCmSoftwarePlatform/hipify_torch.git) registered for path 'third_party/fbgemm/third_party/hipify_torch' 2023-01-11T21:04:24.0683328Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/third_party/asmjit'... 2023-01-11T21:04:25.7512976Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/third_party/cpuinfo'... 2023-01-11T21:04:26.3370319Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/third_party/googletest'... 2023-01-11T21:04:27.3348740Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/third_party/hipify_torch'... 2023-01-11T21:04:27.7287960Z Submodule path 'third_party/fbgemm/third_party/asmjit': checked out 'd3fbf7c9bc7c1d1365a94a45614b91c5a3706b81' 2023-01-11T21:04:27.8525896Z Submodule path 'third_party/fbgemm/third_party/cpuinfo': checked out 'ed8b86a253800bafdb7b25c5c399f91bff9cb1f3' 2023-01-11T21:04:27.9215406Z Submodule path 'third_party/fbgemm/third_party/googletest': checked out 'cbf019de22c8dd37b2108da35b2748fd702d1796' 2023-01-11T21:04:27.9327917Z Submodule path 'third_party/fbgemm/third_party/hipify_torch': checked out '1840658c184f3eeba787dae0f06c45756c1daaf5' 2023-01-11T21:04:28.0373325Z Submodule path 'third_party/flatbuffers': checked out 'd0cede9c90c5257537c293517a21376408b549fa' 2023-01-11T21:04:28.0795728Z Submodule path 'third_party/fmt': checked out '7bdf0628b1276379886c7f6dda2cef2b3b374f0b' 2023-01-11T21:04:28.0895043Z Submodule path 'third_party/foxi': checked out 'c278588e34e535f0bb8f00df3880d26928038cad' 2023-01-11T21:04:28.1358588Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350' 2023-01-11T21:04:28.1633752Z Submodule path 'third_party/gloo': checked out '4a5e339b764261d20fc409071dc7a8b8989aa195' 2023-01-11T21:04:28.2157986Z Submodule path 'third_party/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2023-01-11T21:04:28.2289206Z Submodule path 'third_party/ideep': checked out 'e533c771a1e75a1c225c14b2261eefa62681d9e6' 2023-01-11T21:04:28.2305838Z Submodule 'mkl-dnn' (https://github.com/intel/mkl-dnn.git) registered for path 'third_party/ideep/mkl-dnn' 2023-01-11T21:04:28.2332765Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep/mkl-dnn'... 2023-01-11T21:04:36.9872793Z Submodule path 'third_party/ideep/mkl-dnn': checked out '404ad76ee633c939d705eb583ffe50a806969d5e' 2023-01-11T21:04:36.9893013Z Submodule 'third_party/oneDNN' (https://github.com/oneapi-src/oneDNN.git) registered for path 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2023-01-11T21:04:36.9920313Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep/mkl-dnn/third_party/oneDNN'... 2023-01-11T21:04:45.7825213Z Submodule path 'third_party/ideep/mkl-dnn/third_party/oneDNN': checked out 'fbec3e25a559ee252022ae066817b204e106a6ba' 2023-01-11T21:04:45.7940846Z Submodule path 'third_party/ios-cmake': checked out '8abaed637d56f1337d6e1d2c4026e25c1eade724' 2023-01-11T21:04:45.8109869Z Submodule path 'third_party/ittapi': checked out '5b8a7d7422611c3a0d799fb5fc5dd4abfae35b42' 2023-01-11T21:04:45.9222703Z Submodule path 'third_party/kineto': checked out '6c1629809068efd78a8d56b4aa479c7ec49ae562' 2023-01-11T21:04:45.9240013Z Submodule 'libkineto/third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/fmt' 2023-01-11T21:04:45.9243193Z Submodule 'libkineto/third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/googletest' 2023-01-11T21:04:45.9270343Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/fmt'... 2023-01-11T21:04:47.6083250Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/googletest'... 2023-01-11T21:04:48.6765203Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '2591ab91c3898c9f6544fff04660276537d32ffd' 2023-01-11T21:04:48.7404655Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '7aca84427f224eeed3144123d5230d5871e93347' 2023-01-11T21:04:48.7651821Z Submodule path 'third_party/nccl/nccl': checked out 'f89fd4777d2ef9229c039ff750ae21da01626f52' 2023-01-11T21:04:48.7803784Z Submodule path 'third_party/neon2sse': checked out '97a126f08ce318023be604d03f88bf0820a9464a' 2023-01-11T21:04:48.9104610Z Submodule path 'third_party/nlohmann': checked out '87cda1d6646592ac5866dc703c8e1839046a6806' 2023-01-11T21:04:49.2263185Z Submodule path 'third_party/onnx': checked out 'f7ee1ac60d06abe8e26c9b6bbe1e3db5286b614b' 2023-01-11T21:04:49.2294474Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/onnx/third_party/benchmark' 2023-01-11T21:04:49.2297625Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx/third_party/pybind11' 2023-01-11T21:04:49.2325485Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx/third_party/benchmark'... 2023-01-11T21:04:49.7384449Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx/third_party/pybind11'... 2023-01-11T21:04:50.6947504Z Submodule path 'third_party/onnx/third_party/benchmark': checked out '0d98dba29d66e93259db7daa53a9327df767a415' 2023-01-11T21:04:50.7321029Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'ffa346860b306c9bbfb341aed9c14c067751feb8' 2023-01-11T21:04:50.7494887Z Submodule path 'third_party/onnx-tensorrt': checked out 'c153211418a7c57ce071d9ce2a41f8d1c85a878f' 2023-01-11T21:04:50.7512138Z Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx-tensorrt/third_party/onnx' 2023-01-11T21:04:50.7539162Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt/third_party/onnx'... 2023-01-11T21:04:53.0263459Z Submodule path 'third_party/onnx-tensorrt/third_party/onnx': checked out '765f5ee823a67a866f4bd28a9860e81f3c811ce8' 2023-01-11T21:04:53.0285776Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2023-01-11T21:04:53.0289696Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2023-01-11T21:04:53.0317353Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark'... 2023-01-11T21:04:53.4678729Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11'... 2023-01-11T21:04:54.4204542Z Submodule path 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark': checked out 'e776aa0275e293707b6a0901e0e8d8a8a3679508' 2023-01-11T21:04:54.5000902Z Submodule path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11': checked out 'a1041190c8b8ff0cd9e2f0752248ad5e3789ea0c' 2023-01-11T21:04:54.5017421Z Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2023-01-11T21:04:54.5043118Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang'... 2023-01-11T21:04:54.7699830Z Submodule path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2023-01-11T21:04:54.7801230Z Submodule path 'third_party/pocketfft': checked out 'ea778e37710c07723435b1be58235996d1d43a5a' 2023-01-11T21:04:55.0979515Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a' 2023-01-11T21:04:55.1001625Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/protobuf/third_party/benchmark' 2023-01-11T21:04:55.1004926Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/protobuf/third_party/googletest' 2023-01-11T21:04:55.1032768Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/benchmark'... 2023-01-11T21:04:55.5555739Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/googletest'... 2023-01-11T21:04:57.9175722Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8' 2023-01-11T21:04:57.9991523Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081' 2023-01-11T21:04:58.0084762Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900' 2023-01-11T21:04:58.0208364Z Submodule path 'third_party/pthreadpool': checked out 'a134dd5d4cee80cce15db81a72e7f929d71dd413' 2023-01-11T21:04:58.0596942Z Submodule path 'third_party/pybind11': checked out '80dc998efced8ceb2be59756668a7e90e8bef917' 2023-01-11T21:04:58.0696848Z Submodule path 'third_party/python-enum': checked out '4cfedc426c4e2fc52e3f5c2b4297e15ed8d6b8c7' 2023-01-11T21:04:58.1033230Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67' 2023-01-11T21:04:58.1135998Z Submodule path 'third_party/python-six': checked out '15e31431af97e5e64b80af0a3f598d382bcdd49a' 2023-01-11T21:04:58.1657736Z Submodule path 'third_party/sleef': checked out 'e0a003ee838b75d11763aa9c3ef17bf71a725bff' 2023-01-11T21:04:58.3053938Z Submodule path 'third_party/tbb': checked out 'a51a90bc609bb73db8ea13841b5cf7aa4344d4a9' 2023-01-11T21:04:58.3359741Z Submodule path 'third_party/tensorpipe': checked out '52791a2fd214b2a9dc5759d36725909c1daa7f2e' 2023-01-11T21:04:58.3377763Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/tensorpipe/third_party/googletest' 2023-01-11T21:04:58.3381377Z Submodule 'third_party/libnop' (https://github.com/google/libnop.git) registered for path 'third_party/tensorpipe/third_party/libnop' 2023-01-11T21:04:58.3384664Z Submodule 'third_party/libuv' (https://github.com/libuv/libuv.git) registered for path 'third_party/tensorpipe/third_party/libuv' 2023-01-11T21:04:58.3388043Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/tensorpipe/third_party/pybind11' 2023-01-11T21:04:58.3415669Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/googletest'... 2023-01-11T21:04:59.3115589Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libnop'... 2023-01-11T21:04:59.6046694Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libuv'... 2023-01-11T21:05:00.8599239Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11'... 2023-01-11T21:05:02.1625347Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e' 2023-01-11T21:05:02.1797101Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281' 2023-01-11T21:05:02.2558031Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '1dff88e5161cba5c59276d2070d2e304e4dcb242' 2023-01-11T21:05:02.2880541Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef' 2023-01-11T21:05:02.2897195Z Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2023-01-11T21:05:02.2925121Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11/tools/clang'... 2023-01-11T21:05:03.0057527Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2023-01-11T21:05:03.1677060Z Submodule path 'third_party/zstd': checked out 'aec56a52fbab207fc639a1937d1e708a282edca8' 2023-01-11T21:05:03.1708496Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0 2023-01-11T21:05:03.2026946Z Entering 'android/libs/fbjni' 2023-01-11T21:05:03.2071251Z Entering 'third_party/FP16' 2023-01-11T21:05:03.2113558Z Entering 'third_party/FXdiv' 2023-01-11T21:05:03.2157014Z Entering 'third_party/NNPACK' 2023-01-11T21:05:03.2201160Z Entering 'third_party/QNNPACK' 2023-01-11T21:05:03.2245066Z Entering 'third_party/VulkanMemoryAllocator' 2023-01-11T21:05:03.2288896Z Entering 'third_party/XNNPACK' 2023-01-11T21:05:03.2343742Z Entering 'third_party/benchmark' 2023-01-11T21:05:03.2386759Z Entering 'third_party/cpuinfo' 2023-01-11T21:05:03.2429709Z Entering 'third_party/cub' 2023-01-11T21:05:03.2471854Z Entering 'third_party/cudnn_frontend' 2023-01-11T21:05:03.2519355Z Entering 'third_party/cutlass' 2023-01-11T21:05:03.2569019Z Entering 'third_party/eigen' 2023-01-11T21:05:03.2613627Z Entering 'third_party/fbgemm' 2023-01-11T21:05:03.2655652Z Entering 'third_party/fbgemm/third_party/asmjit' 2023-01-11T21:05:03.2700996Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2023-01-11T21:05:03.2743685Z Entering 'third_party/fbgemm/third_party/googletest' 2023-01-11T21:05:03.2785936Z Entering 'third_party/fbgemm/third_party/hipify_torch' 2023-01-11T21:05:03.2828995Z Entering 'third_party/flatbuffers' 2023-01-11T21:05:03.2873724Z Entering 'third_party/fmt' 2023-01-11T21:05:03.2915753Z Entering 'third_party/foxi' 2023-01-11T21:05:03.2958129Z Entering 'third_party/gemmlowp/gemmlowp' 2023-01-11T21:05:03.2999969Z Entering 'third_party/gloo' 2023-01-11T21:05:03.3042998Z Entering 'third_party/googletest' 2023-01-11T21:05:03.3085980Z Entering 'third_party/ideep' 2023-01-11T21:05:03.3127706Z Entering 'third_party/ideep/mkl-dnn' 2023-01-11T21:05:03.3171709Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2023-01-11T21:05:03.3222409Z Entering 'third_party/ios-cmake' 2023-01-11T21:05:03.3265327Z Entering 'third_party/ittapi' 2023-01-11T21:05:03.3306656Z Entering 'third_party/kineto' 2023-01-11T21:05:03.3348359Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2023-01-11T21:05:03.3390526Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2023-01-11T21:05:03.3434819Z Entering 'third_party/nccl/nccl' 2023-01-11T21:05:03.3477421Z Entering 'third_party/neon2sse' 2023-01-11T21:05:03.3519170Z Entering 'third_party/nlohmann' 2023-01-11T21:05:03.3563365Z Entering 'third_party/onnx' 2023-01-11T21:05:03.3618752Z Entering 'third_party/onnx/third_party/benchmark' 2023-01-11T21:05:03.3661060Z Entering 'third_party/onnx/third_party/pybind11' 2023-01-11T21:05:03.3704659Z Entering 'third_party/onnx-tensorrt' 2023-01-11T21:05:03.3746938Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2023-01-11T21:05:03.3794953Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2023-01-11T21:05:03.3837712Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2023-01-11T21:05:03.3880285Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2023-01-11T21:05:03.3926627Z Entering 'third_party/pocketfft' 2023-01-11T21:05:03.3969150Z Entering 'third_party/protobuf' 2023-01-11T21:05:03.4014968Z Entering 'third_party/protobuf/third_party/benchmark' 2023-01-11T21:05:03.4056672Z Entering 'third_party/protobuf/third_party/googletest' 2023-01-11T21:05:03.4100549Z Entering 'third_party/psimd' 2023-01-11T21:05:03.4143293Z Entering 'third_party/pthreadpool' 2023-01-11T21:05:03.4185041Z Entering 'third_party/pybind11' 2023-01-11T21:05:03.4227366Z Entering 'third_party/python-enum' 2023-01-11T21:05:03.4269186Z Entering 'third_party/python-peachpy' 2023-01-11T21:05:03.4310919Z Entering 'third_party/python-six' 2023-01-11T21:05:03.4353293Z Entering 'third_party/sleef' 2023-01-11T21:05:03.4395264Z Entering 'third_party/tbb' 2023-01-11T21:05:03.4439793Z Entering 'third_party/tensorpipe' 2023-01-11T21:05:03.4481906Z Entering 'third_party/tensorpipe/third_party/googletest' 2023-01-11T21:05:03.4524582Z Entering 'third_party/tensorpipe/third_party/libnop' 2023-01-11T21:05:03.4566651Z Entering 'third_party/tensorpipe/third_party/libuv' 2023-01-11T21:05:03.4608835Z Entering 'third_party/tensorpipe/third_party/pybind11' 2023-01-11T21:05:03.4649788Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2023-01-11T21:05:03.4695022Z Entering 'third_party/zstd' 2023-01-11T21:05:03.4748796Z ##[endgroup] 2023-01-11T21:05:03.4749320Z ##[group]Persisting credentials for submodules 2023-01-11T21:05:03.4756537Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || : 2023-01-11T21:05:03.5064762Z Entering 'android/libs/fbjni' 2023-01-11T21:05:03.5105780Z Entering 'third_party/FP16' 2023-01-11T21:05:03.5147732Z Entering 'third_party/FXdiv' 2023-01-11T21:05:03.5188965Z Entering 'third_party/NNPACK' 2023-01-11T21:05:03.5230231Z Entering 'third_party/QNNPACK' 2023-01-11T21:05:03.5271797Z Entering 'third_party/VulkanMemoryAllocator' 2023-01-11T21:05:03.5313420Z Entering 'third_party/XNNPACK' 2023-01-11T21:05:03.5365137Z Entering 'third_party/benchmark' 2023-01-11T21:05:03.5407263Z Entering 'third_party/cpuinfo' 2023-01-11T21:05:03.5449324Z Entering 'third_party/cub' 2023-01-11T21:05:03.5490495Z Entering 'third_party/cudnn_frontend' 2023-01-11T21:05:03.5537619Z Entering 'third_party/cutlass' 2023-01-11T21:05:03.5585890Z Entering 'third_party/eigen' 2023-01-11T21:05:03.5629988Z Entering 'third_party/fbgemm' 2023-01-11T21:05:03.5670962Z Entering 'third_party/fbgemm/third_party/asmjit' 2023-01-11T21:05:03.5712777Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2023-01-11T21:05:03.5754700Z Entering 'third_party/fbgemm/third_party/googletest' 2023-01-11T21:05:03.5797526Z Entering 'third_party/fbgemm/third_party/hipify_torch' 2023-01-11T21:05:03.5840034Z Entering 'third_party/flatbuffers' 2023-01-11T21:05:03.5882908Z Entering 'third_party/fmt' 2023-01-11T21:05:03.5925298Z Entering 'third_party/foxi' 2023-01-11T21:05:03.5966173Z Entering 'third_party/gemmlowp/gemmlowp' 2023-01-11T21:05:03.6007166Z Entering 'third_party/gloo' 2023-01-11T21:05:03.6049370Z Entering 'third_party/googletest' 2023-01-11T21:05:03.6091191Z Entering 'third_party/ideep' 2023-01-11T21:05:03.6131287Z Entering 'third_party/ideep/mkl-dnn' 2023-01-11T21:05:03.6174943Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2023-01-11T21:05:03.6223915Z Entering 'third_party/ios-cmake' 2023-01-11T21:05:03.6264872Z Entering 'third_party/ittapi' 2023-01-11T21:05:03.6306517Z Entering 'third_party/kineto' 2023-01-11T21:05:03.6348325Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2023-01-11T21:05:03.6389567Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2023-01-11T21:05:03.6431998Z Entering 'third_party/nccl/nccl' 2023-01-11T21:05:03.6473280Z Entering 'third_party/neon2sse' 2023-01-11T21:05:03.6514087Z Entering 'third_party/nlohmann' 2023-01-11T21:05:03.6556508Z Entering 'third_party/onnx' 2023-01-11T21:05:03.6609954Z Entering 'third_party/onnx/third_party/benchmark' 2023-01-11T21:05:03.6652997Z Entering 'third_party/onnx/third_party/pybind11' 2023-01-11T21:05:03.6695787Z Entering 'third_party/onnx-tensorrt' 2023-01-11T21:05:03.6735898Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2023-01-11T21:05:03.6782058Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2023-01-11T21:05:03.6823008Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2023-01-11T21:05:03.6865676Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2023-01-11T21:05:03.6912410Z Entering 'third_party/pocketfft' 2023-01-11T21:05:03.6952885Z Entering 'third_party/protobuf' 2023-01-11T21:05:03.6997642Z Entering 'third_party/protobuf/third_party/benchmark' 2023-01-11T21:05:03.7038791Z Entering 'third_party/protobuf/third_party/googletest' 2023-01-11T21:05:03.7081664Z Entering 'third_party/psimd' 2023-01-11T21:05:03.7122884Z Entering 'third_party/pthreadpool' 2023-01-11T21:05:03.7163967Z Entering 'third_party/pybind11' 2023-01-11T21:05:03.7204974Z Entering 'third_party/python-enum' 2023-01-11T21:05:03.7245872Z Entering 'third_party/python-peachpy' 2023-01-11T21:05:03.7287042Z Entering 'third_party/python-six' 2023-01-11T21:05:03.7329372Z Entering 'third_party/sleef' 2023-01-11T21:05:03.7370875Z Entering 'third_party/tbb' 2023-01-11T21:05:03.7412990Z Entering 'third_party/tensorpipe' 2023-01-11T21:05:03.7454333Z Entering 'third_party/tensorpipe/third_party/googletest' 2023-01-11T21:05:03.7494512Z Entering 'third_party/tensorpipe/third_party/libnop' 2023-01-11T21:05:03.7534677Z Entering 'third_party/tensorpipe/third_party/libuv' 2023-01-11T21:05:03.7575329Z Entering 'third_party/tensorpipe/third_party/pybind11' 2023-01-11T21:05:03.7614990Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2023-01-11T21:05:03.7658606Z Entering 'third_party/zstd' 2023-01-11T21:05:03.7713913Z [command]/usr/bin/git submodule foreach --recursive git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url 2023-01-11T21:05:03.8020257Z Entering 'android/libs/fbjni' 2023-01-11T21:05:03.8057971Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2023-01-11T21:05:03.8075449Z Entering 'third_party/FP16' 2023-01-11T21:05:03.8114432Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2023-01-11T21:05:03.8131059Z Entering 'third_party/FXdiv' 2023-01-11T21:05:03.8170500Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2023-01-11T21:05:03.8188143Z Entering 'third_party/NNPACK' 2023-01-11T21:05:03.8227255Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2023-01-11T21:05:03.8244745Z Entering 'third_party/QNNPACK' 2023-01-11T21:05:03.8283357Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/QNNPACK/config remote.origin.url 2023-01-11T21:05:03.8301041Z Entering 'third_party/VulkanMemoryAllocator' 2023-01-11T21:05:03.8339621Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2023-01-11T21:05:03.8357254Z Entering 'third_party/XNNPACK' 2023-01-11T21:05:03.8396060Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2023-01-11T21:05:03.8424069Z Entering 'third_party/benchmark' 2023-01-11T21:05:03.8462348Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2023-01-11T21:05:03.8479399Z Entering 'third_party/cpuinfo' 2023-01-11T21:05:03.8518841Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2023-01-11T21:05:03.8536374Z Entering 'third_party/cub' 2023-01-11T21:05:03.8574237Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cub/config remote.origin.url 2023-01-11T21:05:03.8592155Z Entering 'third_party/cudnn_frontend' 2023-01-11T21:05:03.8630109Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2023-01-11T21:05:03.8652689Z Entering 'third_party/cutlass' 2023-01-11T21:05:03.8691560Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2023-01-11T21:05:03.8715931Z Entering 'third_party/eigen' 2023-01-11T21:05:03.8753831Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/eigen/config remote.origin.url 2023-01-11T21:05:03.8773196Z Entering 'third_party/fbgemm' 2023-01-11T21:05:03.8812453Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2023-01-11T21:05:03.8830359Z Entering 'third_party/fbgemm/third_party/asmjit' 2023-01-11T21:05:03.8869463Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/third_party/asmjit/config remote.origin.url 2023-01-11T21:05:03.8886101Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2023-01-11T21:05:03.8925847Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/third_party/cpuinfo/config remote.origin.url 2023-01-11T21:05:03.8944848Z Entering 'third_party/fbgemm/third_party/googletest' 2023-01-11T21:05:03.8983882Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/third_party/googletest/config remote.origin.url 2023-01-11T21:05:03.9000555Z Entering 'third_party/fbgemm/third_party/hipify_torch' 2023-01-11T21:05:03.9039504Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/third_party/hipify_torch/config remote.origin.url 2023-01-11T21:05:03.9057282Z Entering 'third_party/flatbuffers' 2023-01-11T21:05:03.9095328Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2023-01-11T21:05:03.9114898Z Entering 'third_party/fmt' 2023-01-11T21:05:03.9152635Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2023-01-11T21:05:03.9169815Z Entering 'third_party/foxi' 2023-01-11T21:05:03.9208963Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/foxi/config remote.origin.url 2023-01-11T21:05:03.9226474Z Entering 'third_party/gemmlowp/gemmlowp' 2023-01-11T21:05:03.9264320Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2023-01-11T21:05:03.9281444Z Entering 'third_party/gloo' 2023-01-11T21:05:03.9319174Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2023-01-11T21:05:03.9336437Z Entering 'third_party/googletest' 2023-01-11T21:05:03.9375171Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2023-01-11T21:05:03.9392649Z Entering 'third_party/ideep' 2023-01-11T21:05:03.9430507Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2023-01-11T21:05:03.9446465Z Entering 'third_party/ideep/mkl-dnn' 2023-01-11T21:05:03.9484628Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2023-01-11T21:05:03.9504616Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2023-01-11T21:05:03.9542751Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/modules/third_party/oneDNN/config remote.origin.url 2023-01-11T21:05:03.9566239Z Entering 'third_party/ios-cmake' 2023-01-11T21:05:03.9605253Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ios-cmake/config remote.origin.url 2023-01-11T21:05:03.9622813Z Entering 'third_party/ittapi' 2023-01-11T21:05:03.9661009Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2023-01-11T21:05:03.9677837Z Entering 'third_party/kineto' 2023-01-11T21:05:03.9716304Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2023-01-11T21:05:03.9732880Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2023-01-11T21:05:03.9771055Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2023-01-11T21:05:03.9789176Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2023-01-11T21:05:03.9827461Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2023-01-11T21:05:03.9845409Z Entering 'third_party/nccl/nccl' 2023-01-11T21:05:03.9884008Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nccl/nccl/config remote.origin.url 2023-01-11T21:05:03.9902175Z Entering 'third_party/neon2sse' 2023-01-11T21:05:03.9940135Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/neon2sse/config remote.origin.url 2023-01-11T21:05:03.9956819Z Entering 'third_party/nlohmann' 2023-01-11T21:05:03.9995354Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2023-01-11T21:05:04.0013346Z Entering 'third_party/onnx' 2023-01-11T21:05:04.0052678Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2023-01-11T21:05:04.0082025Z Entering 'third_party/onnx/third_party/benchmark' 2023-01-11T21:05:04.0120336Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/benchmark/config remote.origin.url 2023-01-11T21:05:04.0137154Z Entering 'third_party/onnx/third_party/pybind11' 2023-01-11T21:05:04.0175969Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2023-01-11T21:05:04.0194849Z Entering 'third_party/onnx-tensorrt' 2023-01-11T21:05:04.0234620Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/config remote.origin.url 2023-01-11T21:05:04.0250444Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2023-01-11T21:05:04.0288692Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/modules/third_party/onnx/config remote.origin.url 2023-01-11T21:05:04.0310480Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2023-01-11T21:05:04.0349083Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/modules/third_party/onnx/modules/third_party/benchmark/config remote.origin.url 2023-01-11T21:05:04.0366343Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2023-01-11T21:05:04.0405745Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2023-01-11T21:05:04.0423013Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2023-01-11T21:05:04.0461715Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/modules/third_party/onnx/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2023-01-11T21:05:04.0483110Z Entering 'third_party/pocketfft' 2023-01-11T21:05:04.0521680Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2023-01-11T21:05:04.0538760Z Entering 'third_party/protobuf' 2023-01-11T21:05:04.0576401Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2023-01-11T21:05:04.0597348Z Entering 'third_party/protobuf/third_party/benchmark' 2023-01-11T21:05:04.0635442Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2023-01-11T21:05:04.0652607Z Entering 'third_party/protobuf/third_party/googletest' 2023-01-11T21:05:04.0690823Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2023-01-11T21:05:04.0710047Z Entering 'third_party/psimd' 2023-01-11T21:05:04.0748385Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2023-01-11T21:05:04.0765360Z Entering 'third_party/pthreadpool' 2023-01-11T21:05:04.0803880Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2023-01-11T21:05:04.0821450Z Entering 'third_party/pybind11' 2023-01-11T21:05:04.0859556Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2023-01-11T21:05:04.0876702Z Entering 'third_party/python-enum' 2023-01-11T21:05:04.0914563Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-enum/config remote.origin.url 2023-01-11T21:05:04.0931747Z Entering 'third_party/python-peachpy' 2023-01-11T21:05:04.0970690Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2023-01-11T21:05:04.0988031Z Entering 'third_party/python-six' 2023-01-11T21:05:04.1025852Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-six/config remote.origin.url 2023-01-11T21:05:04.1042767Z Entering 'third_party/sleef' 2023-01-11T21:05:04.1082960Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2023-01-11T21:05:04.1100367Z Entering 'third_party/tbb' 2023-01-11T21:05:04.1138371Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tbb/config remote.origin.url 2023-01-11T21:05:04.1157437Z Entering 'third_party/tensorpipe' 2023-01-11T21:05:04.1196084Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2023-01-11T21:05:04.1212758Z Entering 'third_party/tensorpipe/third_party/googletest' 2023-01-11T21:05:04.1252613Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2023-01-11T21:05:04.1269636Z Entering 'third_party/tensorpipe/third_party/libnop' 2023-01-11T21:05:04.1307261Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2023-01-11T21:05:04.1324222Z Entering 'third_party/tensorpipe/third_party/libuv' 2023-01-11T21:05:04.1363094Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2023-01-11T21:05:04.1380480Z Entering 'third_party/tensorpipe/third_party/pybind11' 2023-01-11T21:05:04.1417822Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2023-01-11T21:05:04.1434694Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2023-01-11T21:05:04.1473679Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2023-01-11T21:05:04.1493358Z Entering 'third_party/zstd' 2023-01-11T21:05:04.1531668Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/zstd/config remote.origin.url 2023-01-11T21:05:04.2434888Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2023-01-11T21:05:04.2740503Z Entering 'android/libs/fbjni' 2023-01-11T21:05:04.2782897Z Entering 'third_party/FP16' 2023-01-11T21:05:04.2824999Z Entering 'third_party/FXdiv' 2023-01-11T21:05:04.2866691Z Entering 'third_party/NNPACK' 2023-01-11T21:05:04.2909498Z Entering 'third_party/QNNPACK' 2023-01-11T21:05:04.2951511Z Entering 'third_party/VulkanMemoryAllocator' 2023-01-11T21:05:04.2994736Z Entering 'third_party/XNNPACK' 2023-01-11T21:05:04.3047382Z Entering 'third_party/benchmark' 2023-01-11T21:05:04.3090002Z Entering 'third_party/cpuinfo' 2023-01-11T21:05:04.3133290Z Entering 'third_party/cub' 2023-01-11T21:05:04.3175750Z Entering 'third_party/cudnn_frontend' 2023-01-11T21:05:04.3223853Z Entering 'third_party/cutlass' 2023-01-11T21:05:04.3272982Z Entering 'third_party/eigen' 2023-01-11T21:05:04.3318737Z Entering 'third_party/fbgemm' 2023-01-11T21:05:04.3361292Z Entering 'third_party/fbgemm/third_party/asmjit' 2023-01-11T21:05:04.3402908Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2023-01-11T21:05:04.3445124Z Entering 'third_party/fbgemm/third_party/googletest' 2023-01-11T21:05:04.3486548Z Entering 'third_party/fbgemm/third_party/hipify_torch' 2023-01-11T21:05:04.3529079Z Entering 'third_party/flatbuffers' 2023-01-11T21:05:04.3573505Z Entering 'third_party/fmt' 2023-01-11T21:05:04.3616033Z Entering 'third_party/foxi' 2023-01-11T21:05:04.3658851Z Entering 'third_party/gemmlowp/gemmlowp' 2023-01-11T21:05:04.3702601Z Entering 'third_party/gloo' 2023-01-11T21:05:04.3745920Z Entering 'third_party/googletest' 2023-01-11T21:05:04.3788783Z Entering 'third_party/ideep' 2023-01-11T21:05:04.3830266Z Entering 'third_party/ideep/mkl-dnn' 2023-01-11T21:05:04.3873880Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2023-01-11T21:05:04.3922132Z Entering 'third_party/ios-cmake' 2023-01-11T21:05:04.3965396Z Entering 'third_party/ittapi' 2023-01-11T21:05:04.4006830Z Entering 'third_party/kineto' 2023-01-11T21:05:04.4050678Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2023-01-11T21:05:04.4093541Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2023-01-11T21:05:04.4137473Z Entering 'third_party/nccl/nccl' 2023-01-11T21:05:04.4180893Z Entering 'third_party/neon2sse' 2023-01-11T21:05:04.4222452Z Entering 'third_party/nlohmann' 2023-01-11T21:05:04.4267971Z Entering 'third_party/onnx' 2023-01-11T21:05:04.4323033Z Entering 'third_party/onnx/third_party/benchmark' 2023-01-11T21:05:04.4365464Z Entering 'third_party/onnx/third_party/pybind11' 2023-01-11T21:05:04.4410283Z Entering 'third_party/onnx-tensorrt' 2023-01-11T21:05:04.4452962Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2023-01-11T21:05:04.4500550Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2023-01-11T21:05:04.4542904Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2023-01-11T21:05:04.4585801Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2023-01-11T21:05:04.4632303Z Entering 'third_party/pocketfft' 2023-01-11T21:05:04.4673726Z Entering 'third_party/protobuf' 2023-01-11T21:05:04.4719060Z Entering 'third_party/protobuf/third_party/benchmark' 2023-01-11T21:05:04.4761552Z Entering 'third_party/protobuf/third_party/googletest' 2023-01-11T21:05:04.4805323Z Entering 'third_party/psimd' 2023-01-11T21:05:04.4846712Z Entering 'third_party/pthreadpool' 2023-01-11T21:05:04.4889267Z Entering 'third_party/pybind11' 2023-01-11T21:05:04.4931689Z Entering 'third_party/python-enum' 2023-01-11T21:05:04.4973976Z Entering 'third_party/python-peachpy' 2023-01-11T21:05:04.5015849Z Entering 'third_party/python-six' 2023-01-11T21:05:04.5059170Z Entering 'third_party/sleef' 2023-01-11T21:05:04.5100953Z Entering 'third_party/tbb' 2023-01-11T21:05:04.5146430Z Entering 'third_party/tensorpipe' 2023-01-11T21:05:04.5189820Z Entering 'third_party/tensorpipe/third_party/googletest' 2023-01-11T21:05:04.5231925Z Entering 'third_party/tensorpipe/third_party/libnop' 2023-01-11T21:05:04.5273532Z Entering 'third_party/tensorpipe/third_party/libuv' 2023-01-11T21:05:04.5315826Z Entering 'third_party/tensorpipe/third_party/pybind11' 2023-01-11T21:05:04.5357181Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2023-01-11T21:05:04.5401843Z Entering 'third_party/zstd' 2023-01-11T21:05:04.5456694Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2023-01-11T21:05:04.5765894Z Entering 'android/libs/fbjni' 2023-01-11T21:05:04.5807832Z Entering 'third_party/FP16' 2023-01-11T21:05:04.5850970Z Entering 'third_party/FXdiv' 2023-01-11T21:05:04.5893188Z Entering 'third_party/NNPACK' 2023-01-11T21:05:04.5935474Z Entering 'third_party/QNNPACK' 2023-01-11T21:05:04.5977905Z Entering 'third_party/VulkanMemoryAllocator' 2023-01-11T21:05:04.6022331Z Entering 'third_party/XNNPACK' 2023-01-11T21:05:04.6075069Z Entering 'third_party/benchmark' 2023-01-11T21:05:04.6118300Z Entering 'third_party/cpuinfo' 2023-01-11T21:05:04.6160732Z Entering 'third_party/cub' 2023-01-11T21:05:04.6203180Z Entering 'third_party/cudnn_frontend' 2023-01-11T21:05:04.6251566Z Entering 'third_party/cutlass' 2023-01-11T21:05:04.6300558Z Entering 'third_party/eigen' 2023-01-11T21:05:04.6345651Z Entering 'third_party/fbgemm' 2023-01-11T21:05:04.6387836Z Entering 'third_party/fbgemm/third_party/asmjit' 2023-01-11T21:05:04.6429312Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2023-01-11T21:05:04.6471724Z Entering 'third_party/fbgemm/third_party/googletest' 2023-01-11T21:05:04.6513278Z Entering 'third_party/fbgemm/third_party/hipify_torch' 2023-01-11T21:05:04.6555803Z Entering 'third_party/flatbuffers' 2023-01-11T21:05:04.6600143Z Entering 'third_party/fmt' 2023-01-11T21:05:04.6642061Z Entering 'third_party/foxi' 2023-01-11T21:05:04.6683952Z Entering 'third_party/gemmlowp/gemmlowp' 2023-01-11T21:05:04.6726046Z Entering 'third_party/gloo' 2023-01-11T21:05:04.6768272Z Entering 'third_party/googletest' 2023-01-11T21:05:04.6811223Z Entering 'third_party/ideep' 2023-01-11T21:05:04.6853534Z Entering 'third_party/ideep/mkl-dnn' 2023-01-11T21:05:04.6896805Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2023-01-11T21:05:04.6946021Z Entering 'third_party/ios-cmake' 2023-01-11T21:05:04.6988464Z Entering 'third_party/ittapi' 2023-01-11T21:05:04.7030798Z Entering 'third_party/kineto' 2023-01-11T21:05:04.7072464Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2023-01-11T21:05:04.7115332Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2023-01-11T21:05:04.7159025Z Entering 'third_party/nccl/nccl' 2023-01-11T21:05:04.7202278Z Entering 'third_party/neon2sse' 2023-01-11T21:05:04.7244045Z Entering 'third_party/nlohmann' 2023-01-11T21:05:04.7287028Z Entering 'third_party/onnx' 2023-01-11T21:05:04.7343232Z Entering 'third_party/onnx/third_party/benchmark' 2023-01-11T21:05:04.7384765Z Entering 'third_party/onnx/third_party/pybind11' 2023-01-11T21:05:04.7428793Z Entering 'third_party/onnx-tensorrt' 2023-01-11T21:05:04.7469609Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2023-01-11T21:05:04.7517879Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2023-01-11T21:05:04.7559952Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2023-01-11T21:05:04.7602748Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2023-01-11T21:05:04.7650077Z Entering 'third_party/pocketfft' 2023-01-11T21:05:04.7692606Z Entering 'third_party/protobuf' 2023-01-11T21:05:04.7738314Z Entering 'third_party/protobuf/third_party/benchmark' 2023-01-11T21:05:04.7781565Z Entering 'third_party/protobuf/third_party/googletest' 2023-01-11T21:05:04.7825188Z Entering 'third_party/psimd' 2023-01-11T21:05:04.7867559Z Entering 'third_party/pthreadpool' 2023-01-11T21:05:04.7910090Z Entering 'third_party/pybind11' 2023-01-11T21:05:04.7951480Z Entering 'third_party/python-enum' 2023-01-11T21:05:04.7993198Z Entering 'third_party/python-peachpy' 2023-01-11T21:05:04.8035308Z Entering 'third_party/python-six' 2023-01-11T21:05:04.8077263Z Entering 'third_party/sleef' 2023-01-11T21:05:04.8119305Z Entering 'third_party/tbb' 2023-01-11T21:05:04.8164279Z Entering 'third_party/tensorpipe' 2023-01-11T21:05:04.8206519Z Entering 'third_party/tensorpipe/third_party/googletest' 2023-01-11T21:05:04.8248302Z Entering 'third_party/tensorpipe/third_party/libnop' 2023-01-11T21:05:04.8290078Z Entering 'third_party/tensorpipe/third_party/libuv' 2023-01-11T21:05:04.8331941Z Entering 'third_party/tensorpipe/third_party/pybind11' 2023-01-11T21:05:04.8373027Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2023-01-11T21:05:04.8418236Z Entering 'third_party/zstd' 2023-01-11T21:05:04.8470088Z ##[endgroup] 2023-01-11T21:05:04.8511284Z [command]/usr/bin/git log -1 --format='%H' 2023-01-11T21:05:04.8539743Z '8419ddda87c8a47eacc63b54bc7ec98c1f27c26e' 2023-01-11T21:05:04.8693067Z Prepare all required actions 2023-01-11T21:05:04.8728303Z ##[group]Run ./.github/actions/setup-linux 2023-01-11T21:05:04.8728624Z env: 2023-01-11T21:05:04.8728911Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:05:04.8729193Z ##[endgroup] 2023-01-11T21:05:04.8751117Z ##[group]Run set -euo pipefail 2023-01-11T21:05:04.8751474Z set -euo pipefail 2023-01-11T21:05:04.8751811Z function get_ec2_metadata() { 2023-01-11T21:05:04.8752171Z  # Pulled from instance metadata endpoint for EC2 2023-01-11T21:05:04.8752681Z  # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html 2023-01-11T21:05:04.8753126Z  category=$1 2023-01-11T21:05:04.8753498Z  curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" 2023-01-11T21:05:04.8753823Z } 2023-01-11T21:05:04.8754139Z echo "ami-id: $(get_ec2_metadata ami-id)" 2023-01-11T21:05:04.8754565Z echo "instance-id: $(get_ec2_metadata instance-id)" 2023-01-11T21:05:04.8754965Z echo "instance-type: $(get_ec2_metadata instance-type)" 2023-01-11T21:05:04.8755340Z echo "system info $(uname -a)" 2023-01-11T21:05:04.8768045Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T21:05:04.8768359Z env: 2023-01-11T21:05:04.8768638Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:05:04.8768937Z ##[endgroup] 2023-01-11T21:05:04.8869203Z ami-id: ami-096198a0bccc6bad4 2023-01-11T21:05:04.8930569Z instance-id: i-0f0fe094d8805bec6 2023-01-11T21:05:04.8992931Z instance-type: g3.8xlarge 2023-01-11T21:05:04.9001144Z system info Linux ip-10-0-0-157.ec2.internal 4.14.252-195.483.amzn2.x86_64 #1 SMP Mon Nov 1 20:58:46 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux 2023-01-11T21:05:04.9021976Z ##[group]Run if systemctl is-active --quiet docker; then 2023-01-11T21:05:04.9022349Z if systemctl is-active --quiet docker; then 2023-01-11T21:05:04.9022679Z  echo "Docker daemon is running..."; 2023-01-11T21:05:04.9022957Z else 2023-01-11T21:05:04.9023254Z  echo "Starting docker deamon..." && sudo systemctl start docker; 2023-01-11T21:05:04.9023559Z fi 2023-01-11T21:05:04.9034812Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T21:05:04.9035091Z env: 2023-01-11T21:05:04.9035334Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:05:04.9035591Z ##[endgroup] 2023-01-11T21:05:04.9085710Z Docker daemon is running... 2023-01-11T21:05:04.9107521Z ##[group]Run AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") 2023-01-11T21:05:04.9108033Z AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") 2023-01-11T21:05:04.9108455Z retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } 2023-01-11T21:05:04.9108983Z retry aws ecr get-login*** "$AWS_DEFAULT_REGION" | docker login --username AWS \ 2023-01-11T21:05:04.9109475Z  --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" 2023-01-11T21:05:04.9120647Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T21:05:04.9121000Z env: 2023-01-11T21:05:04.9121285Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:05:04.9121595Z AWS_RETRY_MODE: standard 2023-01-11T21:05:04.9121877Z AWS_MAX_ATTEMPTS: 5 2023-01-11T21:05:04.9122192Z AWS_DEFAULT_REGION: us-east-1 2023-01-11T21:05:04.9122491Z ##[endgroup] 2023-01-11T21:05:05.8407071Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2023-01-11T21:05:05.8407536Z Configure a credential helper to remove this warning. See 2023-01-11T21:05:05.8408043Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2023-01-11T21:05:05.8408309Z 2023-01-11T21:05:05.8409436Z Login Succeeded 2023-01-11T21:05:05.8488131Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2023-01-11T21:05:05.8488534Z env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2023-01-11T21:05:05.8489022Z env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2023-01-11T21:05:05.8501870Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T21:05:05.8502149Z env: 2023-01-11T21:05:05.8502391Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:05:05.8502658Z ##[endgroup] 2023-01-11T21:05:05.8592646Z ##[group]Run pytorch/test-infra/.github/actions/pull-docker-image@main 2023-01-11T21:05:05.8592997Z with: 2023-01-11T21:05:05.8593492Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd 2023-01-11T21:05:05.8593966Z env: 2023-01-11T21:05:05.8594191Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:05:05.8594459Z ##[endgroup] 2023-01-11T21:05:05.8612363Z ##[group]Run retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } 2023-01-11T21:05:05.8612732Z retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } 2023-01-11T21:05:05.8613095Z # ignore output since only exit code is used for conditional 2023-01-11T21:05:05.8613477Z # only pull docker image if it's not available locally 2023-01-11T21:05:05.8613876Z if ! docker inspect --type=image "${DOCKER_IMAGE}" >/dev/null 2>/dev/null; then 2023-01-11T21:05:05.8614272Z  retry docker pull "${DOCKER_IMAGE}" 2023-01-11T21:05:05.8614547Z fi 2023-01-11T21:05:05.8626516Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T21:05:05.8626818Z env: 2023-01-11T21:05:05.8627060Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:05:05.8627562Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd 2023-01-11T21:05:05.8628046Z ##[endgroup] 2023-01-11T21:05:06.1007888Z fd224c2e6c79d7fdec6408da598bf52bc5b201dd: Pulling from pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7 2023-01-11T21:05:06.1008361Z fb668870d8a7: Pulling fs layer 2023-01-11T21:05:06.1008649Z 3dc32ed140fb: Pulling fs layer 2023-01-11T21:05:06.1008925Z 54a1df240516: Pulling fs layer 2023-01-11T21:05:06.1009185Z cf378b3cb3c7: Pulling fs layer 2023-01-11T21:05:06.1009453Z 9b4412378859: Pulling fs layer 2023-01-11T21:05:06.1010741Z 502253a1be21: Pulling fs layer 2023-01-11T21:05:06.1011879Z 5c7dd67e5809: Pulling fs layer 2023-01-11T21:05:06.1013690Z bdfd23ed3f48: Pulling fs layer 2023-01-11T21:05:06.1013996Z aee1dd761bdd: Pulling fs layer 2023-01-11T21:05:06.1014285Z 5feda9af2542: Pulling fs layer 2023-01-11T21:05:06.1017240Z f8371ecb849a: Pulling fs layer 2023-01-11T21:05:06.1017665Z ce4a87d45645: Pulling fs layer 2023-01-11T21:05:06.1017942Z 39629f7269f9: Pulling fs layer 2023-01-11T21:05:06.1018216Z 87d0ffa55850: Pulling fs layer 2023-01-11T21:05:06.1018468Z 70702f8b5bc4: Pulling fs layer 2023-01-11T21:05:06.1018733Z 0c06be5c20e0: Pulling fs layer 2023-01-11T21:05:06.1019003Z b372c2a3bc3f: Pulling fs layer 2023-01-11T21:05:06.1019256Z 582d081a59fa: Pulling fs layer 2023-01-11T21:05:06.1019557Z e1c655e7ec0e: Pulling fs layer 2023-01-11T21:05:06.1020599Z cf378b3cb3c7: Waiting 2023-01-11T21:05:06.1020852Z c7726d39d806: Pulling fs layer 2023-01-11T21:05:06.1021117Z 1c22f2f8c01b: Pulling fs layer 2023-01-11T21:05:06.1021395Z b8f759fd0191: Pulling fs layer 2023-01-11T21:05:06.1021637Z aee1dd761bdd: Waiting 2023-01-11T21:05:06.1021889Z 39629f7269f9: Waiting 2023-01-11T21:05:06.1022149Z e28e73a4bddd: Pulling fs layer 2023-01-11T21:05:06.1022578Z 5feda9af2542: Waiting 2023-01-11T21:05:06.1022850Z 90d8f9bbe048: Pulling fs layer 2023-01-11T21:05:06.1023104Z f8371ecb849a: Waiting 2023-01-11T21:05:06.1023362Z b34bd39d0461: Pulling fs layer 2023-01-11T21:05:06.1023595Z 87d0ffa55850: Waiting 2023-01-11T21:05:06.1023834Z ce4a87d45645: Waiting 2023-01-11T21:05:06.1024082Z 2f2308643d60: Pulling fs layer 2023-01-11T21:05:06.1024325Z 8e3432e5a569: Pulling fs layer 2023-01-11T21:05:06.1025012Z 9ea746919509: Pulling fs layer 2023-01-11T21:05:06.1025279Z 1a2fd7b216d7: Pulling fs layer 2023-01-11T21:05:06.1025522Z b372c2a3bc3f: Waiting 2023-01-11T21:05:06.1025995Z 19fde6a723a0: Pulling fs layer 2023-01-11T21:05:06.1026745Z bdfd23ed3f48: Waiting 2023-01-11T21:05:06.1027228Z 06369252d749: Pulling fs layer 2023-01-11T21:05:06.1027823Z 582d081a59fa: Waiting 2023-01-11T21:05:06.1028255Z 502253a1be21: Waiting 2023-01-11T21:05:06.1028692Z ea4bfeaa0fc7: Pulling fs layer 2023-01-11T21:05:06.1029001Z a1d16b6a5070: Pulling fs layer 2023-01-11T21:05:06.1029243Z 1c22f2f8c01b: Waiting 2023-01-11T21:05:06.1029507Z f550b7ff2470: Pulling fs layer 2023-01-11T21:05:06.1029778Z 12ddc57b99eb: Pulling fs layer 2023-01-11T21:05:06.1030027Z 8345085fb0a0: Pulling fs layer 2023-01-11T21:05:06.1030291Z 4cc94dbec031: Pulling fs layer 2023-01-11T21:05:06.1030559Z 29a7c0d5fa4c: Pulling fs layer 2023-01-11T21:05:06.1030807Z 25571655d0e1: Pulling fs layer 2023-01-11T21:05:06.1031127Z bdf297d7f88c: Pulling fs layer 2023-01-11T21:05:06.1031560Z 06369252d749: Waiting 2023-01-11T21:05:06.1031841Z 0b3950af8ae1: Pulling fs layer 2023-01-11T21:05:06.1032186Z 6d68f7da8baa: Pulling fs layer 2023-01-11T21:05:06.1032516Z 9ea746919509: Waiting 2023-01-11T21:05:06.1032788Z 2f2308643d60: Waiting 2023-01-11T21:05:06.1033134Z 1a2fd7b216d7: Waiting 2023-01-11T21:05:06.1033450Z e28e73a4bddd: Waiting 2023-01-11T21:05:06.1033711Z 90d8f9bbe048: Waiting 2023-01-11T21:05:06.1034322Z b34bd39d0461: Waiting 2023-01-11T21:05:06.1034668Z a1d16b6a5070: Waiting 2023-01-11T21:05:06.1034993Z 19fde6a723a0: Waiting 2023-01-11T21:05:06.1035332Z cca768f96df4: Pulling fs layer 2023-01-11T21:05:06.1036373Z 8c3cf3d5e1c5: Pulling fs layer 2023-01-11T21:05:06.1036732Z bdf297d7f88c: Waiting 2023-01-11T21:05:06.1037311Z 61eecfa8b34e: Pulling fs layer 2023-01-11T21:05:06.1038039Z 4cc94dbec031: Waiting 2023-01-11T21:05:06.1038509Z 29a7c0d5fa4c: Waiting 2023-01-11T21:05:06.1038790Z 95c1ac011645: Pulling fs layer 2023-01-11T21:05:06.1039111Z 25571655d0e1: Waiting 2023-01-11T21:05:06.1039480Z 3046cc00c4ca: Pulling fs layer 2023-01-11T21:05:06.1039775Z 195d560d8cf6: Pulling fs layer 2023-01-11T21:05:06.1040108Z 3046cc00c4ca: Waiting 2023-01-11T21:05:06.1040420Z 95c1ac011645: Waiting 2023-01-11T21:05:06.1040693Z 77250abd5ca4: Pulling fs layer 2023-01-11T21:05:06.1041063Z 881b24daf9c5: Pulling fs layer 2023-01-11T21:05:06.1041420Z 9fbf0a18619e: Pulling fs layer 2023-01-11T21:05:06.1041704Z 02048a597c22: Pulling fs layer 2023-01-11T21:05:06.1042031Z b8f759fd0191: Waiting 2023-01-11T21:05:06.1042373Z 881b24daf9c5: Waiting 2023-01-11T21:05:06.1042673Z 9fbf0a18619e: Waiting 2023-01-11T21:05:06.1043000Z 859052a25d95: Pulling fs layer 2023-01-11T21:05:06.1043332Z 3e03143da3c2: Pulling fs layer 2023-01-11T21:05:06.1043603Z 8e3432e5a569: Waiting 2023-01-11T21:05:06.1043913Z 3e03143da3c2: Waiting 2023-01-11T21:05:06.1044256Z 12ddc57b99eb: Waiting 2023-01-11T21:05:06.1044748Z 8c3cf3d5e1c5: Waiting 2023-01-11T21:05:06.1045241Z 70702f8b5bc4: Waiting 2023-01-11T21:05:06.1045593Z 5c7dd67e5809: Waiting 2023-01-11T21:05:06.1045850Z 02048a597c22: Waiting 2023-01-11T21:05:06.2479401Z 3dc32ed140fb: Verifying Checksum 2023-01-11T21:05:06.2479861Z 3dc32ed140fb: Download complete 2023-01-11T21:05:06.3302622Z cf378b3cb3c7: Verifying Checksum 2023-01-11T21:05:06.3303012Z cf378b3cb3c7: Download complete 2023-01-11T21:05:06.4026779Z 54a1df240516: Download complete 2023-01-11T21:05:06.4316219Z fb668870d8a7: Download complete 2023-01-11T21:05:06.4351802Z 9b4412378859: Verifying Checksum 2023-01-11T21:05:06.4352234Z 9b4412378859: Download complete 2023-01-11T21:05:06.5154061Z 5c7dd67e5809: Download complete 2023-01-11T21:05:06.5182058Z bdfd23ed3f48: Verifying Checksum 2023-01-11T21:05:06.5183305Z bdfd23ed3f48: Download complete 2023-01-11T21:05:06.6026045Z aee1dd761bdd: Verifying Checksum 2023-01-11T21:05:06.6026472Z aee1dd761bdd: Download complete 2023-01-11T21:05:06.6883338Z f8371ecb849a: Verifying Checksum 2023-01-11T21:05:06.6883852Z f8371ecb849a: Download complete 2023-01-11T21:05:06.7670033Z ce4a87d45645: Download complete 2023-01-11T21:05:07.1634167Z fb668870d8a7: Pull complete 2023-01-11T21:05:07.4401707Z 3dc32ed140fb: Pull complete 2023-01-11T21:05:07.9995401Z 54a1df240516: Pull complete 2023-01-11T21:05:08.1005456Z cf378b3cb3c7: Pull complete 2023-01-11T21:05:08.2164333Z 9b4412378859: Pull complete 2023-01-11T21:05:10.5863835Z 39629f7269f9: Download complete 2023-01-11T21:05:10.7221983Z 87d0ffa55850: Verifying Checksum 2023-01-11T21:05:10.7222588Z 87d0ffa55850: Download complete 2023-01-11T21:05:10.8228843Z 70702f8b5bc4: Verifying Checksum 2023-01-11T21:05:10.8229185Z 70702f8b5bc4: Download complete 2023-01-11T21:05:10.8984012Z 0c06be5c20e0: Verifying Checksum 2023-01-11T21:05:10.8984384Z 0c06be5c20e0: Download complete 2023-01-11T21:05:13.3386125Z b372c2a3bc3f: Verifying Checksum 2023-01-11T21:05:13.4400132Z b372c2a3bc3f: Download complete 2023-01-11T21:05:13.4400750Z 582d081a59fa: Verifying Checksum 2023-01-11T21:05:13.4401138Z 582d081a59fa: Download complete 2023-01-11T21:05:13.5029793Z e1c655e7ec0e: Verifying Checksum 2023-01-11T21:05:13.5030245Z e1c655e7ec0e: Download complete 2023-01-11T21:05:17.6589720Z 502253a1be21: Verifying Checksum 2023-01-11T21:05:17.6590414Z 502253a1be21: Download complete 2023-01-11T21:05:17.7615905Z 1c22f2f8c01b: Verifying Checksum 2023-01-11T21:05:17.7616360Z 1c22f2f8c01b: Download complete 2023-01-11T21:05:17.8379916Z b8f759fd0191: Verifying Checksum 2023-01-11T21:05:17.8380474Z b8f759fd0191: Download complete 2023-01-11T21:05:17.9276651Z e28e73a4bddd: Verifying Checksum 2023-01-11T21:05:17.9276996Z e28e73a4bddd: Download complete 2023-01-11T21:05:18.0160679Z 90d8f9bbe048: Download complete 2023-01-11T21:05:18.1051491Z b34bd39d0461: Verifying Checksum 2023-01-11T21:05:18.1051821Z b34bd39d0461: Download complete 2023-01-11T21:05:18.2015732Z 2f2308643d60: Verifying Checksum 2023-01-11T21:05:18.2016371Z 2f2308643d60: Download complete 2023-01-11T21:05:19.7032911Z 8e3432e5a569: Verifying Checksum 2023-01-11T21:05:19.7033300Z 8e3432e5a569: Download complete 2023-01-11T21:05:19.7917888Z 9ea746919509: Download complete 2023-01-11T21:05:19.8975235Z 1a2fd7b216d7: Download complete 2023-01-11T21:05:19.9911732Z 19fde6a723a0: Verifying Checksum 2023-01-11T21:05:19.9912053Z 19fde6a723a0: Download complete 2023-01-11T21:05:20.0933454Z 06369252d749: Verifying Checksum 2023-01-11T21:05:20.0933764Z 06369252d749: Download complete 2023-01-11T21:05:20.1993372Z ea4bfeaa0fc7: Verifying Checksum 2023-01-11T21:05:20.1993726Z ea4bfeaa0fc7: Download complete 2023-01-11T21:05:20.8831201Z 5feda9af2542: Verifying Checksum 2023-01-11T21:05:20.8831558Z 5feda9af2542: Download complete 2023-01-11T21:05:20.9792245Z f550b7ff2470: Verifying Checksum 2023-01-11T21:05:20.9792545Z f550b7ff2470: Download complete 2023-01-11T21:05:21.0927237Z 12ddc57b99eb: Download complete 2023-01-11T21:05:21.8688411Z 8345085fb0a0: Verifying Checksum 2023-01-11T21:05:21.8688752Z 8345085fb0a0: Download complete 2023-01-11T21:05:21.9634644Z 4cc94dbec031: Verifying Checksum 2023-01-11T21:05:21.9635100Z 4cc94dbec031: Download complete 2023-01-11T21:05:22.0845996Z 29a7c0d5fa4c: Verifying Checksum 2023-01-11T21:05:22.0846378Z 29a7c0d5fa4c: Download complete 2023-01-11T21:05:22.4340879Z 25571655d0e1: Verifying Checksum 2023-01-11T21:05:22.4341236Z 25571655d0e1: Download complete 2023-01-11T21:05:22.5191788Z bdf297d7f88c: Verifying Checksum 2023-01-11T21:05:22.5192103Z bdf297d7f88c: Download complete 2023-01-11T21:05:23.1597723Z 0b3950af8ae1: Verifying Checksum 2023-01-11T21:05:23.1598127Z 0b3950af8ae1: Download complete 2023-01-11T21:05:23.2573328Z 6d68f7da8baa: Verifying Checksum 2023-01-11T21:05:23.2574024Z 6d68f7da8baa: Download complete 2023-01-11T21:05:23.3371048Z cca768f96df4: Verifying Checksum 2023-01-11T21:05:23.3371386Z cca768f96df4: Download complete 2023-01-11T21:05:24.9683912Z a1d16b6a5070: Verifying Checksum 2023-01-11T21:05:24.9684402Z a1d16b6a5070: Download complete 2023-01-11T21:05:25.0527730Z 61eecfa8b34e: Verifying Checksum 2023-01-11T21:05:25.0528091Z 61eecfa8b34e: Download complete 2023-01-11T21:05:25.1519595Z 95c1ac011645: Download complete 2023-01-11T21:05:25.2195708Z 3046cc00c4ca: Verifying Checksum 2023-01-11T21:05:25.2196510Z 3046cc00c4ca: Download complete 2023-01-11T21:05:25.2959099Z 195d560d8cf6: Verifying Checksum 2023-01-11T21:05:25.2959682Z 195d560d8cf6: Download complete 2023-01-11T21:05:25.9810325Z 77250abd5ca4: Verifying Checksum 2023-01-11T21:05:25.9810694Z 77250abd5ca4: Download complete 2023-01-11T21:05:26.0804426Z 881b24daf9c5: Verifying Checksum 2023-01-11T21:05:26.0805331Z 881b24daf9c5: Download complete 2023-01-11T21:05:27.8784998Z 9fbf0a18619e: Verifying Checksum 2023-01-11T21:05:27.8785363Z 9fbf0a18619e: Download complete 2023-01-11T21:05:27.9514041Z 02048a597c22: Download complete 2023-01-11T21:05:31.3862188Z 8c3cf3d5e1c5: Verifying Checksum 2023-01-11T21:05:31.4826715Z 502253a1be21: Pull complete 2023-01-11T21:05:31.5008734Z 3e03143da3c2: Verifying Checksum 2023-01-11T21:05:31.5009071Z 3e03143da3c2: Download complete 2023-01-11T21:05:31.5951295Z 5c7dd67e5809: Pull complete 2023-01-11T21:05:31.6954740Z bdfd23ed3f48: Pull complete 2023-01-11T21:05:31.7943782Z aee1dd761bdd: Pull complete 2023-01-11T21:05:53.7963529Z 5feda9af2542: Pull complete 2023-01-11T21:05:55.7383196Z f8371ecb849a: Pull complete 2023-01-11T21:05:57.6757882Z ce4a87d45645: Pull complete 2023-01-11T21:06:05.5760666Z 39629f7269f9: Pull complete 2023-01-11T21:06:07.4241034Z 87d0ffa55850: Pull complete 2023-01-11T21:06:09.2706690Z 70702f8b5bc4: Pull complete 2023-01-11T21:06:11.3096319Z 0c06be5c20e0: Pull complete 2023-01-11T21:06:12.6119462Z c7726d39d806: Verifying Checksum 2023-01-11T21:06:12.6119817Z c7726d39d806: Download complete 2023-01-11T21:06:15.3601320Z b372c2a3bc3f: Pull complete 2023-01-11T21:06:17.7289816Z 582d081a59fa: Pull complete 2023-01-11T21:06:19.8828453Z e1c655e7ec0e: Pull complete 2023-01-11T21:06:52.5955649Z c7726d39d806: Pull complete 2023-01-11T21:06:54.1328932Z 1c22f2f8c01b: Pull complete 2023-01-11T21:06:56.0092512Z b8f759fd0191: Pull complete 2023-01-11T21:06:57.9756804Z e28e73a4bddd: Pull complete 2023-01-11T21:06:59.8730013Z 90d8f9bbe048: Pull complete 2023-01-11T21:07:01.8149587Z 859052a25d95: Verifying Checksum 2023-01-11T21:07:01.8149913Z 859052a25d95: Download complete 2023-01-11T21:07:02.2301435Z b34bd39d0461: Pull complete 2023-01-11T21:07:04.0791232Z 2f2308643d60: Pull complete 2023-01-11T21:07:08.2991122Z 8e3432e5a569: Pull complete 2023-01-11T21:07:10.1759650Z 9ea746919509: Pull complete 2023-01-11T21:07:12.0086522Z 1a2fd7b216d7: Pull complete 2023-01-11T21:07:14.8861637Z 19fde6a723a0: Pull complete 2023-01-11T21:07:18.5094760Z 06369252d749: Pull complete 2023-01-11T21:07:21.1270981Z ea4bfeaa0fc7: Pull complete 2023-01-11T21:07:29.9624723Z a1d16b6a5070: Pull complete 2023-01-11T21:07:31.6860835Z f550b7ff2470: Pull complete 2023-01-11T21:07:33.5301849Z 12ddc57b99eb: Pull complete 2023-01-11T21:07:36.5725692Z 8345085fb0a0: Pull complete 2023-01-11T21:07:38.2515567Z 4cc94dbec031: Pull complete 2023-01-11T21:07:38.3800126Z 29a7c0d5fa4c: Pull complete 2023-01-11T21:07:38.7699338Z 25571655d0e1: Pull complete 2023-01-11T21:07:38.8750153Z bdf297d7f88c: Pull complete 2023-01-11T21:07:40.2896419Z 0b3950af8ae1: Pull complete 2023-01-11T21:07:40.3937813Z 6d68f7da8baa: Pull complete 2023-01-11T21:07:40.4947509Z cca768f96df4: Pull complete 2023-01-11T21:07:46.5759825Z 8c3cf3d5e1c5: Pull complete 2023-01-11T21:07:48.3805056Z 61eecfa8b34e: Pull complete 2023-01-11T21:07:48.6064112Z 95c1ac011645: Pull complete 2023-01-11T21:07:48.7154929Z 3046cc00c4ca: Pull complete 2023-01-11T21:07:48.8123590Z 195d560d8cf6: Pull complete 2023-01-11T21:07:49.5875200Z 77250abd5ca4: Pull complete 2023-01-11T21:07:49.6881873Z 881b24daf9c5: Pull complete 2023-01-11T21:07:51.7140790Z 9fbf0a18619e: Pull complete 2023-01-11T21:07:51.8148597Z 02048a597c22: Pull complete 2023-01-11T21:08:33.5900435Z 859052a25d95: Pull complete 2023-01-11T21:08:35.4380280Z 3e03143da3c2: Pull complete 2023-01-11T21:08:36.7527828Z Digest: sha256:866df6c1171dbe014496717cf2080d6cc72ca611a4e8146525c9ef09640c8ba4 2023-01-11T21:08:37.2543007Z Status: Downloaded newer image for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd 2023-01-11T21:08:37.5360879Z 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd 2023-01-11T21:08:37.5464465Z ##[group]Run pytorch/test-infra/.github/actions/setup-nvidia@main 2023-01-11T21:08:37.5464798Z with: 2023-01-11T21:08:37.5465044Z driver-version: 515.76 2023-01-11T21:08:37.5465284Z env: 2023-01-11T21:08:37.5465502Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:08:37.5465775Z ##[endgroup] 2023-01-11T21:08:37.6998962Z ##[group]Run nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482 2023-01-11T21:08:37.6999288Z with: 2023-01-11T21:08:37.6999504Z timeout_minutes: 10 2023-01-11T21:08:37.6999755Z max_attempts: 3 2023-01-11T21:08:37.7006503Z command: # Is it disgusting to have a full shell script here in this github action? Sure # But is it the best way to make it so that this action relies on nothing else? Absolutely set -eou pipefail DISTRIBUTION=$(. /etc/os-release;echo $ID$VERSION_ID) DRIVER_FN="NVIDIA-Linux-x86_64-${DRIVER_VERSION}.run" YUM_REPO_URL="https://nvidia.github.io/nvidia-docker/${DISTRIBUTION}/nvidia-docker.repo" install_nvidia_docker2_amzn2() { ( set -x # Needed for yum-config-manager sudo yum install -y yum-utils sudo yum-config-manager --add-repo "${YUM_REPO_URL}" sudo yum install -y nvidia-docker2 sudo systemctl restart docker ) } install_nvidia_docker2_ubuntu20() { ( set -x sudo apt-get install -y nvidia-docker2 sudo systemctl restart docker ) } pre_install_nvidia_driver_amzn2() { ( # Purge any nvidia driver installed from RHEL repo sudo yum remove -y nvidia-driver-latest-dkms ) } install_nvidia_driver_common() { ( # Try to gather more information about the runner and its existing NVIDIA driver if any echo "Before installing NVIDIA driver" lspci lsmod modinfo nvidia || true HAS_NVIDIA_DRIVER=0 # Check if NVIDIA driver has already been installed if [ -x "$(command -v nvidia-smi)" ]; then set +e # The driver exists, check its version next. Also check only the first GPU if there are more than one of them # so that the same driver version is not print over multiple lines INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0) NVIDIA_SMI_STATUS=$? if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then echo "Failed to get NVIDIA driver version ($INSTALLED_DRIVER_VERSION). Continuing" elif [ "$INSTALLED_DRIVER_VERSION" != "$DRIVER_VERSION" ]; then echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has been installed, but we expect to have $DRIVER_VERSION instead. Continuing" else HAS_NVIDIA_DRIVER=1 echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has already been installed. Skipping NVIDIA driver installation" fi set -e fi if [ "$HAS_NVIDIA_DRIVER" -eq 0 ]; then # CAUTION: this may need to be updated in future if [ "${DISTRIBUTION}" != ubuntu20.04 ]; then sudo yum groupinstall -y "Development Tools" # ensure our kernel install is the same as our underlying kernel, # groupinstall "Development Tools" has a habit of mismatching kernel headers sudo yum install -y "kernel-devel-uname-r == $(uname -r)" sudo modprobe backlight fi sudo curl -fsL -o /tmp/nvidia_driver "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN" set +e sudo /bin/bash /tmp/nvidia_driver -s --no-drm NVIDIA_INSTALLATION_STATUS=$? RESET_GPU=0 if [ "$NVIDIA_INSTALLATION_STATUS" -ne 0 ]; then sudo cat /var/log/nvidia-installer.log # Fail to install NVIDIA driver, try to reset the GPU RESET_GPU=1 elif [ -x "$(command -v nvidia-smi)" ]; then # Check again if nvidia-smi works even if the driver installation completes successfully INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0) NVIDIA_SMI_STATUS=$? if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then RESET_GPU=1 fi fi if [ "$RESET_GPU" -eq 1 ]; then NVIDIA_DEVICES=$(lspci -D | grep -i NVIDIA | cut -d' ' -f1) # The GPU can get stuck in a failure state if somehow the test crashs the GPU microcode. When this # happens, we'll try to reset all NVIDIA devices https://github.com/pytorch/pytorch/issues/88388 for PCI_ID in $NVIDIA_DEVICES; do DEVICE_ENABLED=$(cat /sys/bus/pci/devices/$PCI_ID/enable) echo "Reseting $PCI_ID (enabled state: $DEVICE_ENABLED)" # This requires sudo permission of course echo "1" | sudo tee /sys/bus/pci/devices/$PCI_ID/reset sleep 1 done fi sudo rm -fv /tmp/nvidia_driver set -e fi ) } post_install_nvidia_driver_common() { ( sudo modprobe nvidia || true echo "After installing NVIDIA driver" lspci lsmod modinfo nvidia || true ( set +e nvidia-smi NVIDIA_SMI_STATUS=$? # Allowable exit statuses for nvidia-smi, see: https://github.com/NVIDIA/gpu-operator/issues/285 if [ "$NVIDIA_SMI_STATUS" -eq 0 ] || [ "$NVIDIA_SMI_STATUS" -eq 14 ]; then echo "INFO: Ignoring allowed status ${NVIDIA_SMI_STATUS}" else echo "ERROR: nvidia-smi exited with unresolved status ${NVIDIA_SMI_STATUS}" exit ${NVIDIA_SMI_STATUS} fi set -e ) ) } install_nvidia_driver_amzn2() { ( set -x pre_install_nvidia_driver_amzn2 install_nvidia_driver_common post_install_nvidia_driver_common ) } install_nvidia_driver_ubuntu20() { ( set -x install_nvidia_driver_common post_install_nvidia_driver_common ) } echo "== Installing nvidia driver ${DRIVER_FN} ==" case "${DISTRIBUTION}" in amzn*) install_nvidia_driver_amzn2 ;; ubuntu20.04) install_nvidia_driver_ubuntu20 ;; *) echo "ERROR: Unknown distribution ${DISTRIBUTION}" exit 1 ;; esac # Install container toolkit based on distribution echo "== Installing nvidia container toolkit for ${DISTRIBUTION} ==" case "${DISTRIBUTION}" in amzn*) install_nvidia_docker2_amzn2 ;; ubuntu20.04) install_nvidia_docker2_ubuntu20 ;; *) echo "ERROR: Unknown distribution ${DISTRIBUTION}" exit 1 ;; esac echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" 2023-01-11T21:08:37.7013424Z retry_wait_seconds: 10 2023-01-11T21:08:37.7013707Z polling_interval_seconds: 1 2023-01-11T21:08:37.7013985Z warning_on_retry: true 2023-01-11T21:08:37.7014232Z continue_on_error: false 2023-01-11T21:08:37.7014477Z env: 2023-01-11T21:08:37.7014716Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:08:37.7014968Z DRIVER_VERSION: 515.76 2023-01-11T21:08:37.7015296Z ##[endgroup] 2023-01-11T21:08:37.7686381Z == Installing nvidia driver NVIDIA-Linux-x86_64-515.76.run == 2023-01-11T21:08:37.7688452Z + pre_install_nvidia_driver_amzn2 2023-01-11T21:08:37.7688866Z + sudo yum remove -y nvidia-driver-latest-dkms 2023-01-11T21:08:38.2387857Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd 2023-01-11T21:08:38.2815199Z No Match for argument: nvidia-driver-latest-dkms 2023-01-11T21:08:38.3103943Z No Packages marked for removal 2023-01-11T21:08:38.3256640Z + install_nvidia_driver_common 2023-01-11T21:08:38.3261116Z + echo 'Before installing NVIDIA driver' 2023-01-11T21:08:38.3261746Z + lspci 2023-01-11T21:08:38.3263820Z Before installing NVIDIA driver 2023-01-11T21:08:39.5193895Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) 2023-01-11T21:08:39.5194387Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 2023-01-11T21:08:39.5194802Z 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] 2023-01-11T21:08:39.5195188Z 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01) 2023-01-11T21:08:39.5195555Z 00:02.0 VGA compatible controller: Cirrus Logic GD 5446 2023-01-11T21:08:39.5199495Z 00:03.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA) 2023-01-11T21:08:39.5199937Z 00:1d.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1) 2023-01-11T21:08:39.5200370Z 00:1e.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1) 2023-01-11T21:08:39.5200788Z 00:1f.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01) 2023-01-11T21:08:39.5201100Z + lsmod 2023-01-11T21:08:39.5217437Z Module Size Used by 2023-01-11T21:08:39.5218038Z xt_conntrack 16384 1 2023-01-11T21:08:39.5218557Z ipt_MASQUERADE 16384 1 2023-01-11T21:08:39.5219150Z nf_nat_masquerade_ipv4 16384 1 ipt_MASQUERADE 2023-01-11T21:08:39.5219445Z nf_conntrack_netlink 49152 0 2023-01-11T21:08:39.5219747Z nfnetlink 16384 2 nf_conntrack_netlink 2023-01-11T21:08:39.5220038Z xfrm_user 45056 1 2023-01-11T21:08:39.5220290Z xfrm_algo 16384 1 xfrm_user 2023-01-11T21:08:39.5220569Z xt_addrtype 16384 2 2023-01-11T21:08:39.5220834Z iptable_filter 16384 1 2023-01-11T21:08:39.5221098Z iptable_nat 16384 1 2023-01-11T21:08:39.5221345Z nf_conntrack_ipv4 16384 3 2023-01-11T21:08:39.5221907Z nf_defrag_ipv4 16384 1 nf_conntrack_ipv4 2023-01-11T21:08:39.5222439Z nf_nat_ipv4 16384 1 iptable_nat 2023-01-11T21:08:39.5222894Z nf_nat 36864 2 nf_nat_masquerade_ipv4,nf_nat_ipv4 2023-01-11T21:08:39.5223837Z nf_conntrack 155648 7 xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_ipv4,nf_nat,ipt_MASQUERADE,nf_nat_ipv4,nf_conntrack_netlink 2023-01-11T21:08:39.5224614Z br_netfilter 24576 0 2023-01-11T21:08:39.5225071Z bridge 172032 1 br_netfilter 2023-01-11T21:08:39.5225395Z stp 16384 1 bridge 2023-01-11T21:08:39.5225951Z llc 16384 2 bridge,stp 2023-01-11T21:08:39.5226455Z overlay 86016 0 2023-01-11T21:08:39.5226916Z sunrpc 393216 1 2023-01-11T21:08:39.5227274Z dm_mirror 28672 0 2023-01-11T21:08:39.5227547Z dm_region_hash 20480 1 dm_mirror 2023-01-11T21:08:39.5227832Z dm_log 20480 2 dm_region_hash,dm_mirror 2023-01-11T21:08:39.5228298Z dm_mod 143360 2 dm_log,dm_mirror 2023-01-11T21:08:39.5228692Z dax 69632 1 dm_mod 2023-01-11T21:08:39.5228994Z sb_edac 24576 0 2023-01-11T21:08:39.5229465Z crc32_pclmul 16384 0 2023-01-11T21:08:39.5229781Z ghash_clmulni_intel 16384 0 2023-01-11T21:08:39.5230033Z pcbc 16384 0 2023-01-11T21:08:39.5230290Z aesni_intel 188416 0 2023-01-11T21:08:39.5230560Z aes_x86_64 20480 1 aesni_intel 2023-01-11T21:08:39.5230804Z ata_piix 36864 0 2023-01-11T21:08:39.5231261Z crypto_simd 16384 1 aesni_intel 2023-01-11T21:08:39.5231548Z glue_helper 16384 1 aesni_intel 2023-01-11T21:08:39.5231867Z cryptd 28672 3 crypto_simd,ghash_clmulni_intel,aesni_intel 2023-01-11T21:08:39.5232192Z pcc_cpufreq 16384 0 2023-01-11T21:08:39.5232462Z libata 266240 1 ata_piix 2023-01-11T21:08:39.5232724Z mousedev 24576 0 2023-01-11T21:08:39.5233100Z evdev 20480 3 2023-01-11T21:08:39.5233588Z scsi_mod 245760 1 libata 2023-01-11T21:08:39.5233872Z psmouse 32768 0 2023-01-11T21:08:39.5234105Z button 16384 0 2023-01-11T21:08:39.5234445Z ena 114688 0 2023-01-11T21:08:39.5234714Z xen_blkfront 49152 2 2023-01-11T21:08:39.5235159Z crc32c_intel 24576 0 2023-01-11T21:08:39.5235512Z autofs4 49152 2 2023-01-11T21:08:39.5235974Z + modinfo nvidia 2023-01-11T21:08:39.5236391Z modinfo: ERROR: Module nvidia not found. 2023-01-11T21:08:39.5236664Z + true 2023-01-11T21:08:39.5236895Z + HAS_NVIDIA_DRIVER=0 2023-01-11T21:08:39.5237245Z ++ command -v nvidia-smi 2023-01-11T21:08:39.5237527Z + '[' -x '' ']' 2023-01-11T21:08:39.5237793Z + '[' 0 -eq 0 ']' 2023-01-11T21:08:39.5238074Z + '[' amzn2 '!=' ubuntu20.04 ']' 2023-01-11T21:08:39.5238436Z + sudo yum groupinstall -y 'Development Tools' 2023-01-11T21:08:39.9921998Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd 2023-01-11T21:08:40.3018755Z Resolving Dependencies 2023-01-11T21:08:40.3024948Z --> Running transaction check 2023-01-11T21:08:40.3027945Z ---> Package autoconf.noarch 0:2.69-11.amzn2 will be installed 2023-01-11T21:08:40.3241386Z --> Processing Dependency: m4 >= 1.4.14 for package: autoconf-2.69-11.amzn2.noarch 2023-01-11T21:08:40.5466994Z --> Processing Dependency: perl(Data::Dumper) for package: autoconf-2.69-11.amzn2.noarch 2023-01-11T21:08:40.5470301Z ---> Package automake.noarch 0:1.13.4-3.1.amzn2 will be installed 2023-01-11T21:08:40.5517006Z --> Processing Dependency: perl(Thread::Queue) for package: automake-1.13.4-3.1.amzn2.noarch 2023-01-11T21:08:40.5524380Z --> Processing Dependency: perl(TAP::Parser) for package: automake-1.13.4-3.1.amzn2.noarch 2023-01-11T21:08:40.5535391Z ---> Package bison.x86_64 0:3.0.4-6.amzn2.0.2 will be installed 2023-01-11T21:08:40.5656457Z ---> Package byacc.x86_64 0:1.9.20130304-3.amzn2.0.2 will be installed 2023-01-11T21:08:40.5664624Z ---> Package cscope.x86_64 0:15.8-10.amzn2.0.2 will be installed 2023-01-11T21:08:40.5711190Z --> Processing Dependency: emacs-filesystem for package: cscope-15.8-10.amzn2.0.2.x86_64 2023-01-11T21:08:40.5735999Z ---> Package ctags.x86_64 0:5.8-13.amzn2.0.2 will be installed 2023-01-11T21:08:40.5747108Z ---> Package diffstat.x86_64 0:1.57-4.amzn2.0.2 will be installed 2023-01-11T21:08:40.5755000Z ---> Package doxygen.x86_64 1:1.8.5-4.amzn2 will be installed 2023-01-11T21:08:40.5857988Z ---> Package elfutils.x86_64 0:0.176-2.amzn2 will be installed 2023-01-11T21:08:40.5999612Z ---> Package flex.x86_64 0:2.5.37-3.amzn2.0.3 will be installed 2023-01-11T21:08:40.6019169Z ---> Package gcc.x86_64 0:7.3.1-15.amzn2 will be installed 2023-01-11T21:08:40.6199529Z --> Processing Dependency: cpp = 7.3.1-15.amzn2 for package: gcc-7.3.1-15.amzn2.x86_64 2023-01-11T21:08:40.6220309Z --> Processing Dependency: libsanitizer >= 7.3.1-15.amzn2 for package: gcc-7.3.1-15.amzn2.x86_64 2023-01-11T21:08:40.6277584Z --> Processing Dependency: libquadmath >= 7.3.1-15.amzn2 for package: gcc-7.3.1-15.amzn2.x86_64 2023-01-11T21:08:40.6331792Z --> Processing Dependency: libmpx >= 7.3.1-15.amzn2 for package: gcc-7.3.1-15.amzn2.x86_64 2023-01-11T21:08:40.6389095Z --> Processing Dependency: libitm >= 7.3.1-15.amzn2 for package: gcc-7.3.1-15.amzn2.x86_64 2023-01-11T21:08:40.6444808Z --> Processing Dependency: libcilkrts >= 7.3.1-15.amzn2 for package: gcc-7.3.1-15.amzn2.x86_64 2023-01-11T21:08:40.6502024Z --> Processing Dependency: libatomic >= 7.3.1-15.amzn2 for package: gcc-7.3.1-15.amzn2.x86_64 2023-01-11T21:08:40.6558533Z --> Processing Dependency: glibc-devel >= 2.2.90-12 for package: gcc-7.3.1-15.amzn2.x86_64 2023-01-11T21:08:40.6726528Z --> Processing Dependency: libmpfr.so.4()(64bit) for package: gcc-7.3.1-15.amzn2.x86_64 2023-01-11T21:08:40.6748825Z --> Processing Dependency: libmpc.so.3()(64bit) for package: gcc-7.3.1-15.amzn2.x86_64 2023-01-11T21:08:40.6770688Z ---> Package gcc-c++.x86_64 0:7.3.1-15.amzn2 will be installed 2023-01-11T21:08:40.6798149Z ---> Package gcc-gfortran.x86_64 0:7.3.1-15.amzn2 will be installed 2023-01-11T21:08:40.6833830Z --> Processing Dependency: libgfortran.so.4()(64bit) for package: gcc-gfortran-7.3.1-15.amzn2.x86_64 2023-01-11T21:08:40.6904644Z ---> Package indent.x86_64 0:2.2.11-13.amzn2.0.2 will be installed 2023-01-11T21:08:40.6919492Z ---> Package intltool.noarch 0:0.50.2-7.amzn2 will be installed 2023-01-11T21:08:40.6966864Z --> Processing Dependency: perl(XML::Parser) for package: intltool-0.50.2-7.amzn2.noarch 2023-01-11T21:08:40.6982895Z --> Processing Dependency: gettext-devel for package: intltool-0.50.2-7.amzn2.noarch 2023-01-11T21:08:40.7002292Z ---> Package libtool.x86_64 0:2.4.2-22.2.amzn2.0.2 will be installed 2023-01-11T21:08:40.7032900Z ---> Package patch.x86_64 0:2.7.1-12.amzn2.0.2 will be installed 2023-01-11T21:08:40.7076796Z ---> Package patchutils.x86_64 0:0.3.3-4.amzn2.0.1 will be installed 2023-01-11T21:08:40.7100333Z ---> Package rcs.x86_64 0:5.9.0-5.amzn2.0.2 will be installed 2023-01-11T21:08:40.7133516Z ---> Package rpm-build.x86_64 0:4.11.3-48.amzn2.0.2 will be installed 2023-01-11T21:08:40.7372627Z --> Processing Dependency: /usr/bin/gdb-add-index for package: rpm-build-4.11.3-48.amzn2.0.2.x86_64 2023-01-11T21:08:40.7391378Z ---> Package rpm-sign.x86_64 0:4.11.3-48.amzn2.0.2 will be installed 2023-01-11T21:08:40.7414733Z ---> Package subversion.x86_64 0:1.7.14-16.amzn2.0.1 will be installed 2023-01-11T21:08:40.7595886Z --> Processing Dependency: subversion-libs(x86-64) = 1.7.14-16.amzn2.0.1 for package: subversion-1.7.14-16.amzn2.0.1.x86_64 2023-01-11T21:08:40.7616444Z --> Processing Dependency: libsvn_wc-1.so.0()(64bit) for package: subversion-1.7.14-16.amzn2.0.1.x86_64 2023-01-11T21:08:40.7618133Z --> Processing Dependency: libsvn_subr-1.so.0()(64bit) for package: subversion-1.7.14-16.amzn2.0.1.x86_64 2023-01-11T21:08:40.7618800Z --> Processing Dependency: libsvn_repos-1.so.0()(64bit) for package: subversion-1.7.14-16.amzn2.0.1.x86_64 2023-01-11T21:08:40.7619964Z --> Processing Dependency: libsvn_ra_svn-1.so.0()(64bit) for package: subversion-1.7.14-16.amzn2.0.1.x86_64 2023-01-11T21:08:40.7621052Z --> Processing Dependency: libsvn_ra_neon-1.so.0()(64bit) for package: subversion-1.7.14-16.amzn2.0.1.x86_64 2023-01-11T21:08:40.7622090Z --> Processing Dependency: libsvn_ra_local-1.so.0()(64bit) for package: subversion-1.7.14-16.amzn2.0.1.x86_64 2023-01-11T21:08:40.7623005Z --> Processing Dependency: libsvn_ra-1.so.0()(64bit) for package: subversion-1.7.14-16.amzn2.0.1.x86_64 2023-01-11T21:08:40.7623815Z --> Processing Dependency: libsvn_fs_util-1.so.0()(64bit) for package: subversion-1.7.14-16.amzn2.0.1.x86_64 2023-01-11T21:08:40.7624879Z --> Processing Dependency: libsvn_fs_fs-1.so.0()(64bit) for package: subversion-1.7.14-16.amzn2.0.1.x86_64 2023-01-11T21:08:40.7625716Z --> Processing Dependency: libsvn_fs_base-1.so.0()(64bit) for package: subversion-1.7.14-16.amzn2.0.1.x86_64 2023-01-11T21:08:40.7626602Z --> Processing Dependency: libsvn_fs-1.so.0()(64bit) for package: subversion-1.7.14-16.amzn2.0.1.x86_64 2023-01-11T21:08:40.7627478Z --> Processing Dependency: libsvn_diff-1.so.0()(64bit) for package: subversion-1.7.14-16.amzn2.0.1.x86_64 2023-01-11T21:08:40.7628111Z --> Processing Dependency: libsvn_delta-1.so.0()(64bit) for package: subversion-1.7.14-16.amzn2.0.1.x86_64 2023-01-11T21:08:40.7628729Z --> Processing Dependency: libsvn_client-1.so.0()(64bit) for package: subversion-1.7.14-16.amzn2.0.1.x86_64 2023-01-11T21:08:40.7629332Z --> Processing Dependency: libneon.so.27()(64bit) for package: subversion-1.7.14-16.amzn2.0.1.x86_64 2023-01-11T21:08:40.7646969Z --> Processing Dependency: libaprutil-1.so.0()(64bit) for package: subversion-1.7.14-16.amzn2.0.1.x86_64 2023-01-11T21:08:40.7668388Z --> Processing Dependency: libapr-1.so.0()(64bit) for package: subversion-1.7.14-16.amzn2.0.1.x86_64 2023-01-11T21:08:40.7693519Z ---> Package swig.x86_64 0:3.0.12-11.amzn2.0.3 will be installed 2023-01-11T21:08:40.7714842Z ---> Package system-rpm-config.noarch 0:9.1.0-76.amzn2.0.14 will be installed 2023-01-11T21:08:40.7763724Z --> Processing Dependency: dwz >= 0.4 for package: system-rpm-config-9.1.0-76.amzn2.0.14.noarch 2023-01-11T21:08:40.7781894Z --> Processing Dependency: perl-srpm-macros for package: system-rpm-config-9.1.0-76.amzn2.0.14.noarch 2023-01-11T21:08:40.7794719Z --> Processing Dependency: go-srpm-macros for package: system-rpm-config-9.1.0-76.amzn2.0.14.noarch 2023-01-11T21:08:40.7976090Z ---> Package systemtap.x86_64 0:4.5-1.amzn2.0.1 will be installed 2023-01-11T21:08:40.7990779Z --> Processing Dependency: systemtap-devel = 4.5-1.amzn2.0.1 for package: systemtap-4.5-1.amzn2.0.1.x86_64 2023-01-11T21:08:40.8012974Z --> Processing Dependency: systemtap-client = 4.5-1.amzn2.0.1 for package: systemtap-4.5-1.amzn2.0.1.x86_64 2023-01-11T21:08:40.8027816Z --> Running transaction check 2023-01-11T21:08:40.8031139Z ---> Package apr.x86_64 0:1.7.0-9.amzn2 will be installed 2023-01-11T21:08:40.8108749Z ---> Package apr-util.x86_64 0:1.6.1-5.amzn2.0.2 will be installed 2023-01-11T21:08:40.8147862Z --> Processing Dependency: apr-util-bdb(x86-64) = 1.6.1-5.amzn2.0.2 for package: apr-util-1.6.1-5.amzn2.0.2.x86_64 2023-01-11T21:08:40.8162934Z ---> Package cpp.x86_64 0:7.3.1-15.amzn2 will be installed 2023-01-11T21:08:40.8236057Z ---> Package dwz.x86_64 0:0.11-3.amzn2.0.3 will be installed 2023-01-11T21:08:40.8246599Z ---> Package emacs-filesystem.noarch 1:27.2-4.amzn2.0.1 will be installed 2023-01-11T21:08:40.8247683Z ---> Package gdb.x86_64 0:8.0.1-36.amzn2.0.1 will be installed 2023-01-11T21:08:40.8319760Z ---> Package gettext-devel.x86_64 0:0.19.8.1-3.amzn2 will be installed 2023-01-11T21:08:40.8380795Z --> Processing Dependency: gettext-common-devel = 0.19.8.1-3.amzn2 for package: gettext-devel-0.19.8.1-3.amzn2.x86_64 2023-01-11T21:08:40.8390279Z ---> Package glibc-devel.x86_64 0:2.26-62.amzn2 will be installed 2023-01-11T21:08:40.8515588Z --> Processing Dependency: glibc-headers = 2.26-62.amzn2 for package: glibc-devel-2.26-62.amzn2.x86_64 2023-01-11T21:08:40.8545504Z --> Processing Dependency: glibc-headers for package: glibc-devel-2.26-62.amzn2.x86_64 2023-01-11T21:08:40.8546156Z ---> Package go-srpm-macros.noarch 0:3.0.15-23.amzn2.0.2 will be installed 2023-01-11T21:08:40.8551314Z ---> Package libatomic.x86_64 0:7.3.1-15.amzn2 will be installed 2023-01-11T21:08:40.8565046Z ---> Package libcilkrts.x86_64 0:7.3.1-15.amzn2 will be installed 2023-01-11T21:08:40.8593655Z ---> Package libgfortran.x86_64 0:7.3.1-15.amzn2 will be installed 2023-01-11T21:08:40.8630356Z ---> Package libitm.x86_64 0:7.3.1-15.amzn2 will be installed 2023-01-11T21:08:40.8646744Z ---> Package libmpc.x86_64 0:1.0.1-3.amzn2.0.2 will be installed 2023-01-11T21:08:40.8660467Z ---> Package libmpx.x86_64 0:7.3.1-15.amzn2 will be installed 2023-01-11T21:08:40.8675678Z ---> Package libquadmath.x86_64 0:7.3.1-15.amzn2 will be installed 2023-01-11T21:08:40.8701982Z ---> Package libsanitizer.x86_64 0:7.3.1-15.amzn2 will be installed 2023-01-11T21:08:40.8748813Z ---> Package m4.x86_64 0:1.4.16-10.amzn2.0.2 will be installed 2023-01-11T21:08:40.8764568Z ---> Package mpfr.x86_64 0:3.1.1-4.amzn2.0.2 will be installed 2023-01-11T21:08:40.8786966Z ---> Package neon.x86_64 0:0.30.0-3.amzn2.0.2 will be installed 2023-01-11T21:08:40.8865356Z --> Processing Dependency: libgnutls.so.28(GNUTLS_2_12)(64bit) for package: neon-0.30.0-3.amzn2.0.2.x86_64 2023-01-11T21:08:40.8905137Z --> Processing Dependency: libgnutls.so.28(GNUTLS_1_4)(64bit) for package: neon-0.30.0-3.amzn2.0.2.x86_64 2023-01-11T21:08:40.8906188Z --> Processing Dependency: libproxy.so.1()(64bit) for package: neon-0.30.0-3.amzn2.0.2.x86_64 2023-01-11T21:08:40.8925947Z --> Processing Dependency: libpakchois.so.0()(64bit) for package: neon-0.30.0-3.amzn2.0.2.x86_64 2023-01-11T21:08:40.8944430Z --> Processing Dependency: libgnutls.so.28()(64bit) for package: neon-0.30.0-3.amzn2.0.2.x86_64 2023-01-11T21:08:40.8950878Z ---> Package perl-Data-Dumper.x86_64 0:2.145-3.amzn2.0.2 will be installed 2023-01-11T21:08:40.8999084Z ---> Package perl-Test-Harness.noarch 0:3.28-3.amzn2 will be installed 2023-01-11T21:08:40.9097605Z ---> Package perl-Thread-Queue.noarch 0:3.02-2.amzn2 will be installed 2023-01-11T21:08:40.9110829Z ---> Package perl-XML-Parser.x86_64 0:2.41-10.amzn2.0.2 will be installed 2023-01-11T21:08:40.9126065Z ---> Package perl-srpm-macros.noarch 0:1-8.amzn2.0.1 will be installed 2023-01-11T21:08:40.9127168Z ---> Package subversion-libs.x86_64 0:1.7.14-16.amzn2.0.1 will be installed 2023-01-11T21:08:40.9156090Z ---> Package systemtap-client.x86_64 0:4.5-1.amzn2.0.1 will be installed 2023-01-11T21:08:40.9372971Z --> Processing Dependency: mokutil for package: systemtap-client-4.5-1.amzn2.0.1.x86_64 2023-01-11T21:08:40.9387676Z --> Processing Dependency: libavahi-common.so.3()(64bit) for package: systemtap-client-4.5-1.amzn2.0.1.x86_64 2023-01-11T21:08:40.9414491Z --> Processing Dependency: libavahi-client.so.3()(64bit) for package: systemtap-client-4.5-1.amzn2.0.1.x86_64 2023-01-11T21:08:40.9415080Z ---> Package systemtap-devel.x86_64 0:4.5-1.amzn2.0.1 will be installed 2023-01-11T21:08:40.9537686Z --> Processing Dependency: kernel-devel-uname-r for package: systemtap-devel-4.5-1.amzn2.0.1.x86_64 2023-01-11T21:08:41.0615580Z --> Running transaction check 2023-01-11T21:08:41.0616469Z ---> Package apr-util-bdb.x86_64 0:1.6.1-5.amzn2.0.2 will be installed 2023-01-11T21:08:41.0626829Z ---> Package avahi-libs.x86_64 0:0.6.31-20.amzn2 will be installed 2023-01-11T21:08:41.0653637Z ---> Package gettext-common-devel.noarch 0:0.19.8.1-3.amzn2 will be installed 2023-01-11T21:08:41.0654683Z ---> Package glibc-headers.x86_64 0:2.26-62.amzn2 will be installed 2023-01-11T21:08:41.0734714Z --> Processing Dependency: kernel-headers >= 2.2.1 for package: glibc-headers-2.26-62.amzn2.x86_64 2023-01-11T21:08:41.1891256Z --> Processing Dependency: kernel-headers for package: glibc-headers-2.26-62.amzn2.x86_64 2023-01-11T21:08:41.1892055Z ---> Package gnutls.x86_64 0:3.3.29-9.amzn2.0.1 will be installed 2023-01-11T21:08:41.1960246Z --> Processing Dependency: trousers >= 0.3.11.2 for package: gnutls-3.3.29-9.amzn2.0.1.x86_64 2023-01-11T21:08:41.1986437Z ---> Package kernel-devel.x86_64 0:4.14.301-224.520.amzn2 will be installed 2023-01-11T21:08:41.2014532Z --> Processing Dependency: elfutils-libelf-devel for package: kernel-devel-4.14.301-224.520.amzn2.x86_64 2023-01-11T21:08:41.2035362Z ---> Package libproxy.x86_64 0:0.4.11-10.amzn2.0.3 will be installed 2023-01-11T21:08:41.2063847Z --> Processing Dependency: libmodman.so.1()(64bit) for package: libproxy-0.4.11-10.amzn2.0.3.x86_64 2023-01-11T21:08:41.2083024Z ---> Package mokutil.x86_64 1:0.3.0-10.amzn2.0.1 will be installed 2023-01-11T21:08:41.2132837Z --> Processing Dependency: libefivar.so.1(libefivar.so.0)(64bit) for package: 1:mokutil-0.3.0-10.amzn2.0.1.x86_64 2023-01-11T21:08:41.2154119Z --> Processing Dependency: libefivar.so.1(LIBEFIVAR_0.24)(64bit) for package: 1:mokutil-0.3.0-10.amzn2.0.1.x86_64 2023-01-11T21:08:41.2155568Z --> Processing Dependency: libefivar.so.1()(64bit) for package: 1:mokutil-0.3.0-10.amzn2.0.1.x86_64 2023-01-11T21:08:41.2156103Z ---> Package pakchois.x86_64 0:0.4-10.amzn2.0.2 will be installed 2023-01-11T21:08:41.2169629Z --> Running transaction check 2023-01-11T21:08:41.2170307Z ---> Package efivar-libs.x86_64 0:31-4.amzn2.0.4 will be installed 2023-01-11T21:08:41.2188533Z ---> Package elfutils-libelf-devel.x86_64 0:0.176-2.amzn2 will be installed 2023-01-11T21:08:41.2201278Z --> Processing Dependency: pkgconfig(zlib) for package: elfutils-libelf-devel-0.176-2.amzn2.x86_64 2023-01-11T21:08:41.2229627Z ---> Package kernel-headers.x86_64 0:4.14.301-224.520.amzn2 will be installed 2023-01-11T21:08:41.2230687Z ---> Package libmodman.x86_64 0:2.0.1-8.amzn2.0.2 will be installed 2023-01-11T21:08:41.2256875Z ---> Package trousers.x86_64 0:0.3.14-2.amzn2.0.2 will be installed 2023-01-11T21:08:41.2317192Z --> Running transaction check 2023-01-11T21:08:41.2317623Z ---> Package zlib-devel.x86_64 0:1.2.7-19.amzn2.0.2 will be installed 2023-01-11T21:08:41.4920614Z --> Finished Dependency Resolution 2023-01-11T21:08:41.5681012Z 2023-01-11T21:08:41.5681550Z Dependencies Resolved 2023-01-11T21:08:41.5803616Z 2023-01-11T21:08:41.5803937Z ================================================================================ 2023-01-11T21:08:41.5804302Z Package Arch Version Repository Size 2023-01-11T21:08:41.5804832Z ================================================================================ 2023-01-11T21:08:41.5805155Z Installing for group install "Development Tools": 2023-01-11T21:08:41.5805714Z autoconf noarch 2.69-11.amzn2 amzn2-core 701 k 2023-01-11T21:08:41.5806176Z automake noarch 1.13.4-3.1.amzn2 amzn2-core 679 k 2023-01-11T21:08:41.5806595Z bison x86_64 3.0.4-6.amzn2.0.2 amzn2-core 674 k 2023-01-11T21:08:41.5807020Z byacc x86_64 1.9.20130304-3.amzn2.0.2 amzn2-core 66 k 2023-01-11T21:08:41.5807450Z cscope x86_64 15.8-10.amzn2.0.2 amzn2-core 204 k 2023-01-11T21:08:41.5807858Z ctags x86_64 5.8-13.amzn2.0.2 amzn2-core 157 k 2023-01-11T21:08:41.5808286Z diffstat x86_64 1.57-4.amzn2.0.2 amzn2-core 35 k 2023-01-11T21:08:41.5808722Z doxygen x86_64 1:1.8.5-4.amzn2 amzn2-core 3.5 M 2023-01-11T21:08:41.5809153Z elfutils x86_64 0.176-2.amzn2 amzn2-core 307 k 2023-01-11T21:08:41.5809562Z flex x86_64 2.5.37-3.amzn2.0.3 amzn2-core 291 k 2023-01-11T21:08:41.5809977Z gcc x86_64 7.3.1-15.amzn2 amzn2-core 22 M 2023-01-11T21:08:41.5810400Z gcc-c++ x86_64 7.3.1-15.amzn2 amzn2-core 13 M 2023-01-11T21:08:41.5810876Z gcc-gfortran x86_64 7.3.1-15.amzn2 amzn2-core 11 M 2023-01-11T21:08:41.5811294Z indent x86_64 2.2.11-13.amzn2.0.2 amzn2-core 150 k 2023-01-11T21:08:41.5811715Z intltool noarch 0.50.2-7.amzn2 amzn2-core 59 k 2023-01-11T21:08:41.5812143Z libtool x86_64 2.4.2-22.2.amzn2.0.2 amzn2-core 588 k 2023-01-11T21:08:41.5812565Z patch x86_64 2.7.1-12.amzn2.0.2 amzn2-core 110 k 2023-01-11T21:08:41.5812982Z patchutils x86_64 0.3.3-4.amzn2.0.1 amzn2-core 104 k 2023-01-11T21:08:41.5813403Z rcs x86_64 5.9.0-5.amzn2.0.2 amzn2-core 231 k 2023-01-11T21:08:41.5813823Z rpm-build x86_64 4.11.3-48.amzn2.0.2 amzn2-core 150 k 2023-01-11T21:08:41.5814241Z rpm-sign x86_64 4.11.3-48.amzn2.0.2 amzn2-core 50 k 2023-01-11T21:08:41.5814667Z subversion x86_64 1.7.14-16.amzn2.0.1 amzn2-core 1.0 M 2023-01-11T21:08:41.5815089Z swig x86_64 3.0.12-11.amzn2.0.3 amzn2-core 1.4 M 2023-01-11T21:08:41.5815515Z system-rpm-config noarch 9.1.0-76.amzn2.0.14 amzn2-core 90 k 2023-01-11T21:08:41.5815963Z systemtap x86_64 4.5-1.amzn2.0.1 amzn2-core 12 k 2023-01-11T21:08:41.5816277Z Installing for dependencies: 2023-01-11T21:08:41.5817007Z apr x86_64 1.7.0-9.amzn2 amzn2-core 122 k 2023-01-11T21:08:41.5817431Z apr-util x86_64 1.6.1-5.amzn2.0.2 amzn2-core 99 k 2023-01-11T21:08:41.5817866Z apr-util-bdb x86_64 1.6.1-5.amzn2.0.2 amzn2-core 19 k 2023-01-11T21:08:41.5818304Z avahi-libs x86_64 0.6.31-20.amzn2 amzn2-core 61 k 2023-01-11T21:08:41.5818843Z cpp x86_64 7.3.1-15.amzn2 amzn2-core 9.2 M 2023-01-11T21:08:41.5819258Z dwz x86_64 0.11-3.amzn2.0.3 amzn2-core 98 k 2023-01-11T21:08:41.5819678Z efivar-libs x86_64 31-4.amzn2.0.4 amzn2-core 68 k 2023-01-11T21:08:41.5820126Z elfutils-libelf-devel x86_64 0.176-2.amzn2 amzn2-core 40 k 2023-01-11T21:08:41.5820569Z emacs-filesystem noarch 1:27.2-4.amzn2.0.1 amzn2-core 67 k 2023-01-11T21:08:41.5821004Z gdb x86_64 8.0.1-36.amzn2.0.1 amzn2-core 3.1 M 2023-01-11T21:08:41.5821526Z gettext-common-devel noarch 0.19.8.1-3.amzn2 amzn2-core 410 k 2023-01-11T21:08:41.5821986Z gettext-devel x86_64 0.19.8.1-3.amzn2 amzn2-core 320 k 2023-01-11T21:08:41.5822423Z glibc-devel x86_64 2.26-62.amzn2 amzn2-core 995 k 2023-01-11T21:08:41.5822870Z glibc-headers x86_64 2.26-62.amzn2 amzn2-core 516 k 2023-01-11T21:08:41.5823304Z gnutls x86_64 3.3.29-9.amzn2.0.1 amzn2-core 661 k 2023-01-11T21:08:41.5823730Z go-srpm-macros noarch 3.0.15-23.amzn2.0.2 amzn2-core 23 k 2023-01-11T21:08:41.5824184Z kernel-devel x86_64 4.14.301-224.520.amzn2 amzn2-core 13 M 2023-01-11T21:08:41.5824630Z kernel-headers x86_64 4.14.301-224.520.amzn2 amzn2-core 1.2 M 2023-01-11T21:08:41.5825046Z libatomic x86_64 7.3.1-15.amzn2 amzn2-core 46 k 2023-01-11T21:08:41.5825477Z libcilkrts x86_64 7.3.1-15.amzn2 amzn2-core 85 k 2023-01-11T21:08:41.5825903Z libgfortran x86_64 7.3.1-15.amzn2 amzn2-core 536 k 2023-01-11T21:08:41.5826325Z libitm x86_64 7.3.1-15.amzn2 amzn2-core 85 k 2023-01-11T21:08:41.5826733Z libmodman x86_64 2.0.1-8.amzn2.0.2 amzn2-core 29 k 2023-01-11T21:08:41.5827164Z libmpc x86_64 1.0.1-3.amzn2.0.2 amzn2-core 52 k 2023-01-11T21:08:41.5827587Z libmpx x86_64 7.3.1-15.amzn2 amzn2-core 51 k 2023-01-11T21:08:41.5827991Z libproxy x86_64 0.4.11-10.amzn2.0.3 amzn2-core 61 k 2023-01-11T21:08:41.5828416Z libquadmath x86_64 7.3.1-15.amzn2 amzn2-core 189 k 2023-01-11T21:08:41.5828853Z libsanitizer x86_64 7.3.1-15.amzn2 amzn2-core 642 k 2023-01-11T21:08:41.5829272Z m4 x86_64 1.4.16-10.amzn2.0.2 amzn2-core 256 k 2023-01-11T21:08:41.5829678Z mokutil x86_64 1:0.3.0-10.amzn2.0.1 amzn2-core 39 k 2023-01-11T21:08:41.5830098Z mpfr x86_64 3.1.1-4.amzn2.0.2 amzn2-core 208 k 2023-01-11T21:08:41.5830512Z neon x86_64 0.30.0-3.amzn2.0.2 amzn2-core 166 k 2023-01-11T21:08:41.5830921Z pakchois x86_64 0.4-10.amzn2.0.2 amzn2-core 14 k 2023-01-11T21:08:41.5831362Z perl-Data-Dumper x86_64 2.145-3.amzn2.0.2 amzn2-core 48 k 2023-01-11T21:08:41.5831823Z perl-Test-Harness noarch 3.28-3.amzn2 amzn2-core 302 k 2023-01-11T21:08:41.5832288Z perl-Thread-Queue noarch 3.02-2.amzn2 amzn2-core 17 k 2023-01-11T21:08:41.5832740Z perl-XML-Parser x86_64 2.41-10.amzn2.0.2 amzn2-core 223 k 2023-01-11T21:08:41.5833198Z perl-srpm-macros noarch 1-8.amzn2.0.1 amzn2-core 4.7 k 2023-01-11T21:08:41.5833662Z subversion-libs x86_64 1.7.14-16.amzn2.0.1 amzn2-core 912 k 2023-01-11T21:08:41.5834096Z systemtap-client x86_64 4.5-1.amzn2.0.1 amzn2-core 3.7 M 2023-01-11T21:08:41.5834545Z systemtap-devel x86_64 4.5-1.amzn2.0.1 amzn2-core 2.3 M 2023-01-11T21:08:41.5835059Z trousers x86_64 0.3.14-2.amzn2.0.2 amzn2-core 294 k 2023-01-11T21:08:41.5835484Z zlib-devel x86_64 1.2.7-19.amzn2.0.2 amzn2-core 50 k 2023-01-11T21:08:41.5835672Z 2023-01-11T21:08:41.5835787Z Transaction Summary 2023-01-11T21:08:41.5836072Z ================================================================================ 2023-01-11T21:08:41.5836386Z Install 25 Packages (+43 Dependent packages) 2023-01-11T21:08:41.5836583Z 2023-01-11T21:08:41.5836698Z Total download size: 96 M 2023-01-11T21:08:41.5836959Z Installed size: 303 M 2023-01-11T21:08:41.5837222Z Downloading packages: 2023-01-11T21:08:41.5863383Z Delta RPMs disabled because /usr/bin/applydeltarpm not installed. 2023-01-11T21:08:42.8535458Z -------------------------------------------------------------------------------- 2023-01-11T21:08:42.8535930Z Total 76 MB/s | 96 MB 00:01 2023-01-11T21:08:42.9582383Z Running transaction check 2023-01-11T21:08:43.0345308Z Running transaction test 2023-01-11T21:08:45.2816082Z Transaction test succeeded 2023-01-11T21:08:45.2819432Z Running transaction 2023-01-11T21:08:50.5622521Z Installing : mpfr-3.1.1-4.amzn2.0.2.x86_64 1/68 2023-01-11T21:08:53.1167395Z Installing : libmpc-1.0.1-3.amzn2.0.2.x86_64 2/68 2023-01-11T21:08:55.9126162Z Installing : m4-1.4.16-10.amzn2.0.2.x86_64 3/68 2023-01-11T21:08:58.2310322Z Installing : apr-1.7.0-9.amzn2.x86_64 4/68 2023-01-11T21:09:00.7139021Z Installing : apr-util-bdb-1.6.1-5.amzn2.0.2.x86_64 5/68 2023-01-11T21:09:03.1537713Z Installing : apr-util-1.6.1-5.amzn2.0.2.x86_64 6/68 2023-01-11T21:09:05.6369103Z Installing : avahi-libs-0.6.31-20.amzn2.x86_64 7/68 2023-01-11T21:09:06.0931849Z Installing : libquadmath-7.3.1-15.amzn2.x86_64 8/68 2023-01-11T21:09:06.1157994Z Installing : patch-2.7.1-12.amzn2.0.2.x86_64 9/68 2023-01-11T21:09:06.1967231Z Installing : perl-Thread-Queue-3.02-2.amzn2.noarch 10/68 2023-01-11T21:09:07.2581689Z Installing : libgfortran-7.3.1-15.amzn2.x86_64 11/68 2023-01-11T21:09:07.2920246Z Installing : cpp-7.3.1-15.amzn2.x86_64 12/68 2023-01-11T21:09:07.3244442Z Installing : libmodman-2.0.1-8.amzn2.0.2.x86_64 13/68 2023-01-11T21:09:07.3813984Z Installing : libproxy-0.4.11-10.amzn2.0.3.x86_64 14/68 2023-01-11T21:09:07.4394718Z Installing : perl-XML-Parser-2.41-10.amzn2.0.2.x86_64 15/68 2023-01-11T21:09:07.5486855Z Installing : elfutils-0.176-2.amzn2.x86_64 16/68 2023-01-11T21:09:07.5799734Z Installing : libsanitizer-7.3.1-15.amzn2.x86_64 17/68 2023-01-11T21:09:07.6048706Z Installing : 1:emacs-filesystem-27.2-4.amzn2.0.1.noarch 18/68 2023-01-11T21:09:07.6360241Z Installing : efivar-libs-31-4.amzn2.0.4.x86_64 19/68 2023-01-11T21:09:07.6626723Z Installing : 1:mokutil-0.3.0-10.amzn2.0.1.x86_64 20/68 2023-01-11T21:09:07.7404573Z Installing : gettext-common-devel-0.19.8.1-3.amzn2.noarch 21/68 2023-01-11T21:09:07.7900224Z Installing : gettext-devel-0.19.8.1-3.amzn2.x86_64 22/68 2023-01-11T21:09:07.8740171Z Installing : dwz-0.11-3.amzn2.0.3.x86_64 23/68 2023-01-11T21:09:08.0518026Z Installing : trousers-0.3.14-2.amzn2.0.2.x86_64 24/68 2023-01-11T21:09:08.0849150Z Installing : gnutls-3.3.29-9.amzn2.0.1.x86_64 25/68 2023-01-11T21:09:08.4788946Z Installing : libitm-7.3.1-15.amzn2.x86_64 26/68 2023-01-11T21:09:08.5085010Z Installing : gdb-8.0.1-36.amzn2.0.1.x86_64 27/68 2023-01-11T21:09:08.5386145Z Installing : libmpx-7.3.1-15.amzn2.x86_64 28/68 2023-01-11T21:09:08.5589268Z Installing : perl-srpm-macros-1-8.amzn2.0.1.noarch 29/68 2023-01-11T21:09:08.5923873Z Installing : go-srpm-macros-3.0.15-23.amzn2.0.2.noarch 30/68 2023-01-11T21:09:08.6195738Z Installing : system-rpm-config-9.1.0-76.amzn2.0.14.noarch 31/68 2023-01-11T21:09:08.7100477Z Installing : perl-Data-Dumper-2.145-3.amzn2.0.2.x86_64 32/68 2023-01-11T21:09:08.7965765Z Installing : autoconf-2.69-11.amzn2.noarch 33/68 2023-01-11T21:09:08.9043201Z Installing : perl-Test-Harness-3.28-3.amzn2.noarch 34/68 2023-01-11T21:09:08.9464890Z Installing : automake-1.13.4-3.1.amzn2.noarch 35/68 2023-01-11T21:09:08.9724059Z Installing : zlib-devel-1.2.7-19.amzn2.0.2.x86_64 36/68 2023-01-11T21:09:08.9926504Z Installing : elfutils-libelf-devel-0.176-2.amzn2.x86_64 37/68 2023-01-11T21:09:09.2822673Z Installing : libatomic-7.3.1-15.amzn2.x86_64 38/68 2023-01-11T21:09:09.4460880Z Installing : kernel-headers-4.14.301-224.520.amzn2.x86_64 39/68 2023-01-11T21:09:09.5783227Z Installing : glibc-headers-2.26-62.amzn2.x86_64 40/68 2023-01-11T21:09:09.6125019Z Installing : glibc-devel-2.26-62.amzn2.x86_64 41/68 2023-01-11T21:09:11.6484339Z Installing : libcilkrts-7.3.1-15.amzn2.x86_64 42/68 2023-01-11T21:09:15.5021894Z Installing : gcc-7.3.1-15.amzn2.x86_64 43/68 2023-01-11T21:09:26.6828435Z Installing : kernel-devel-4.14.301-224.520.amzn2.x86_64 44/68 2023-01-11T21:09:27.2997021Z Installing : systemtap-devel-4.5-1.amzn2.0.1.x86_64 45/68 2023-01-11T21:09:27.3581897Z Installing : systemtap-client-4.5-1.amzn2.0.1.x86_64 46/68 2023-01-11T21:09:27.4135090Z Installing : pakchois-0.4-10.amzn2.0.2.x86_64 47/68 2023-01-11T21:09:27.5472373Z Installing : neon-0.30.0-3.amzn2.0.2.x86_64 48/68 2023-01-11T21:09:27.7256426Z Installing : subversion-libs-1.7.14-16.amzn2.0.1.x86_64 49/68 2023-01-11T21:09:27.8302230Z Installing : subversion-1.7.14-16.amzn2.0.1.x86_64 50/68 2023-01-11T21:09:29.0460928Z Installing : systemtap-4.5-1.amzn2.0.1.x86_64 51/68 2023-01-11T21:09:30.6700200Z Installing : gcc-gfortran-7.3.1-15.amzn2.x86_64 52/68 2023-01-11T21:09:30.7799088Z Installing : gcc-c++-7.3.1-15.amzn2.x86_64 53/68 2023-01-11T21:09:30.8177469Z Installing : libtool-2.4.2-22.2.amzn2.0.2.x86_64 54/68 2023-01-11T21:09:30.8545387Z Installing : intltool-0.50.2-7.amzn2.noarch 55/68 2023-01-11T21:09:30.9058818Z Installing : rpm-build-4.11.3-48.amzn2.0.2.x86_64 56/68 2023-01-11T21:09:30.9621201Z Installing : cscope-15.8-10.amzn2.0.2.x86_64 57/68 2023-01-11T21:09:31.0643843Z Installing : flex-2.5.37-3.amzn2.0.3.x86_64 58/68 2023-01-11T21:09:31.1263528Z Installing : bison-3.0.4-6.amzn2.0.2.x86_64 59/68 2023-01-11T21:09:31.1716097Z Installing : rcs-5.9.0-5.amzn2.0.2.x86_64 60/68 2023-01-11T21:09:31.2084121Z Installing : ctags-5.8-13.amzn2.0.2.x86_64 61/68 2023-01-11T21:09:31.2514348Z Installing : indent-2.2.11-13.amzn2.0.2.x86_64 62/68 2023-01-11T21:09:31.9413700Z Installing : patchutils-0.3.3-4.amzn2.0.1.x86_64 63/68 2023-01-11T21:09:31.9842735Z Installing : 1:doxygen-1.8.5-4.amzn2.x86_64 64/68 2023-01-11T21:09:32.0108226Z Installing : diffstat-1.57-4.amzn2.0.2.x86_64 65/68 2023-01-11T21:09:32.3201581Z Installing : byacc-1.9.20130304-3.amzn2.0.2.x86_64 66/68 2023-01-11T21:09:32.3778829Z Installing : swig-3.0.12-11.amzn2.0.3.x86_64 67/68 2023-01-11T21:09:32.4445910Z Installing : rpm-sign-4.11.3-48.amzn2.0.2.x86_64 68/68 2023-01-11T21:09:32.4563866Z Verifying : elfutils-libelf-devel-0.176-2.amzn2.x86_64 1/68 2023-01-11T21:09:32.4656085Z Verifying : perl-Thread-Queue-3.02-2.amzn2.noarch 2/68 2023-01-11T21:09:32.4747940Z Verifying : gettext-devel-0.19.8.1-3.amzn2.x86_64 3/68 2023-01-11T21:09:32.4840836Z Verifying : patch-2.7.1-12.amzn2.0.2.x86_64 4/68 2023-01-11T21:09:32.4953076Z Verifying : flex-2.5.37-3.amzn2.0.3.x86_64 5/68 2023-01-11T21:09:32.5054046Z Verifying : pakchois-0.4-10.amzn2.0.2.x86_64 6/68 2023-01-11T21:09:32.5162071Z Verifying : rpm-sign-4.11.3-48.amzn2.0.2.x86_64 7/68 2023-01-11T21:09:32.5244490Z Verifying : glibc-devel-2.26-62.amzn2.x86_64 8/68 2023-01-11T21:09:32.5333948Z Verifying : gcc-gfortran-7.3.1-15.amzn2.x86_64 9/68 2023-01-11T21:09:32.5426229Z Verifying : swig-3.0.12-11.amzn2.0.3.x86_64 10/68 2023-01-11T21:09:32.5517794Z Verifying : byacc-1.9.20130304-3.amzn2.0.2.x86_64 11/68 2023-01-11T21:09:32.5603503Z Verifying : libmpc-1.0.1-3.amzn2.0.2.x86_64 12/68 2023-01-11T21:09:32.5690661Z Verifying : libcilkrts-7.3.1-15.amzn2.x86_64 13/68 2023-01-11T21:09:32.5800334Z Verifying : kernel-headers-4.14.301-224.520.amzn2.x86_64 14/68 2023-01-11T21:09:32.5882292Z Verifying : libproxy-0.4.11-10.amzn2.0.3.x86_64 15/68 2023-01-11T21:09:32.5961423Z Verifying : cscope-15.8-10.amzn2.0.2.x86_64 16/68 2023-01-11T21:09:32.6062885Z Verifying : diffstat-1.57-4.amzn2.0.2.x86_64 17/68 2023-01-11T21:09:32.6165937Z Verifying : 1:doxygen-1.8.5-4.amzn2.x86_64 18/68 2023-01-11T21:09:32.6259672Z Verifying : gcc-c++-7.3.1-15.amzn2.x86_64 19/68 2023-01-11T21:09:32.6355023Z Verifying : libatomic-7.3.1-15.amzn2.x86_64 20/68 2023-01-11T21:09:32.6443792Z Verifying : system-rpm-config-9.1.0-76.amzn2.0.14.noarch 21/68 2023-01-11T21:09:32.6549120Z Verifying : systemtap-devel-4.5-1.amzn2.0.1.x86_64 22/68 2023-01-11T21:09:32.6647266Z Verifying : zlib-devel-1.2.7-19.amzn2.0.2.x86_64 23/68 2023-01-11T21:09:32.6742298Z Verifying : glibc-headers-2.26-62.amzn2.x86_64 24/68 2023-01-11T21:09:32.6843957Z Verifying : perl-Test-Harness-3.28-3.amzn2.noarch 25/68 2023-01-11T21:09:32.6936436Z Verifying : autoconf-2.69-11.amzn2.noarch 26/68 2023-01-11T21:09:32.7045724Z Verifying : libquadmath-7.3.1-15.amzn2.x86_64 27/68 2023-01-11T21:09:32.7159170Z Verifying : intltool-0.50.2-7.amzn2.noarch 28/68 2023-01-11T21:09:32.7251275Z Verifying : apr-util-1.6.1-5.amzn2.0.2.x86_64 29/68 2023-01-11T21:09:32.7346556Z Verifying : cpp-7.3.1-15.amzn2.x86_64 30/68 2023-01-11T21:09:32.7453037Z Verifying : rpm-build-4.11.3-48.amzn2.0.2.x86_64 31/68 2023-01-11T21:09:32.7557158Z Verifying : go-srpm-macros-3.0.15-23.amzn2.0.2.noarch 32/68 2023-01-11T21:09:32.7656430Z Verifying : perl-Data-Dumper-2.145-3.amzn2.0.2.x86_64 33/68 2023-01-11T21:09:32.7741381Z Verifying : perl-srpm-macros-1-8.amzn2.0.1.noarch 34/68 2023-01-11T21:09:32.7840877Z Verifying : gnutls-3.3.29-9.amzn2.0.1.x86_64 35/68 2023-01-11T21:09:32.7941239Z Verifying : subversion-libs-1.7.14-16.amzn2.0.1.x86_64 36/68 2023-01-11T21:09:32.8043193Z Verifying : automake-1.13.4-3.1.amzn2.noarch 37/68 2023-01-11T21:09:32.8126705Z Verifying : apr-util-bdb-1.6.1-5.amzn2.0.2.x86_64 38/68 2023-01-11T21:09:32.8216381Z Verifying : libmpx-7.3.1-15.amzn2.x86_64 39/68 2023-01-11T21:09:32.8317306Z Verifying : avahi-libs-0.6.31-20.amzn2.x86_64 40/68 2023-01-11T21:09:32.8405399Z Verifying : bison-3.0.4-6.amzn2.0.2.x86_64 41/68 2023-01-11T21:09:32.8493659Z Verifying : libgfortran-7.3.1-15.amzn2.x86_64 42/68 2023-01-11T21:09:32.8615877Z Verifying : gdb-8.0.1-36.amzn2.0.1.x86_64 43/68 2023-01-11T21:09:32.8700290Z Verifying : patchutils-0.3.3-4.amzn2.0.1.x86_64 44/68 2023-01-11T21:09:32.8791669Z Verifying : libitm-7.3.1-15.amzn2.x86_64 45/68 2023-01-11T21:09:32.8878915Z Verifying : libtool-2.4.2-22.2.amzn2.0.2.x86_64 46/68 2023-01-11T21:09:32.8973302Z Verifying : gcc-7.3.1-15.amzn2.x86_64 47/68 2023-01-11T21:09:32.9077598Z Verifying : indent-2.2.11-13.amzn2.0.2.x86_64 48/68 2023-01-11T21:09:32.9163455Z Verifying : subversion-1.7.14-16.amzn2.0.1.x86_64 49/68 2023-01-11T21:09:32.9257815Z Verifying : apr-1.7.0-9.amzn2.x86_64 50/68 2023-01-11T21:09:32.9343127Z Verifying : ctags-5.8-13.amzn2.0.2.x86_64 51/68 2023-01-11T21:09:32.9436866Z Verifying : 1:mokutil-0.3.0-10.amzn2.0.1.x86_64 52/68 2023-01-11T21:09:32.9526915Z Verifying : mpfr-3.1.1-4.amzn2.0.2.x86_64 53/68 2023-01-11T21:09:32.9607008Z Verifying : trousers-0.3.14-2.amzn2.0.2.x86_64 54/68 2023-01-11T21:09:32.9692759Z Verifying : neon-0.30.0-3.amzn2.0.2.x86_64 55/68 2023-01-11T21:09:32.9795909Z Verifying : systemtap-4.5-1.amzn2.0.1.x86_64 56/68 2023-01-11T21:09:32.9884885Z Verifying : dwz-0.11-3.amzn2.0.3.x86_64 57/68 2023-01-11T21:09:32.9984501Z Verifying : gettext-common-devel-0.19.8.1-3.amzn2.noarch 58/68 2023-01-11T21:09:33.0084022Z Verifying : systemtap-client-4.5-1.amzn2.0.1.x86_64 59/68 2023-01-11T21:09:33.0187825Z Verifying : efivar-libs-31-4.amzn2.0.4.x86_64 60/68 2023-01-11T21:09:33.0278344Z Verifying : rcs-5.9.0-5.amzn2.0.2.x86_64 61/68 2023-01-11T21:09:33.0442228Z Verifying : kernel-devel-4.14.301-224.520.amzn2.x86_64 62/68 2023-01-11T21:09:33.0542420Z Verifying : 1:emacs-filesystem-27.2-4.amzn2.0.1.noarch 63/68 2023-01-11T21:09:33.0650444Z Verifying : libsanitizer-7.3.1-15.amzn2.x86_64 64/68 2023-01-11T21:09:33.0741915Z Verifying : elfutils-0.176-2.amzn2.x86_64 65/68 2023-01-11T21:09:33.0836795Z Verifying : m4-1.4.16-10.amzn2.0.2.x86_64 66/68 2023-01-11T21:09:33.0930616Z Verifying : perl-XML-Parser-2.41-10.amzn2.0.2.x86_64 67/68 2023-01-11T21:09:33.1697510Z Verifying : libmodman-2.0.1-8.amzn2.0.2.x86_64 68/68 2023-01-11T21:09:33.1699383Z 2023-01-11T21:09:33.1700090Z Installed: 2023-01-11T21:09:33.1700671Z autoconf.noarch 0:2.69-11.amzn2 2023-01-11T21:09:33.1701354Z automake.noarch 0:1.13.4-3.1.amzn2 2023-01-11T21:09:33.1706128Z bison.x86_64 0:3.0.4-6.amzn2.0.2 2023-01-11T21:09:33.1706608Z byacc.x86_64 0:1.9.20130304-3.amzn2.0.2 2023-01-11T21:09:33.1707042Z cscope.x86_64 0:15.8-10.amzn2.0.2 2023-01-11T21:09:33.1707461Z ctags.x86_64 0:5.8-13.amzn2.0.2 2023-01-11T21:09:33.1707859Z diffstat.x86_64 0:1.57-4.amzn2.0.2 2023-01-11T21:09:33.1708274Z doxygen.x86_64 1:1.8.5-4.amzn2 2023-01-11T21:09:33.1708887Z elfutils.x86_64 0:0.176-2.amzn2 2023-01-11T21:09:33.1709327Z flex.x86_64 0:2.5.37-3.amzn2.0.3 2023-01-11T21:09:33.1709735Z gcc.x86_64 0:7.3.1-15.amzn2 2023-01-11T21:09:33.1710159Z gcc-c++.x86_64 0:7.3.1-15.amzn2 2023-01-11T21:09:33.1710582Z gcc-gfortran.x86_64 0:7.3.1-15.amzn2 2023-01-11T21:09:33.1711071Z indent.x86_64 0:2.2.11-13.amzn2.0.2 2023-01-11T21:09:33.1711487Z intltool.noarch 0:0.50.2-7.amzn2 2023-01-11T21:09:33.1711901Z libtool.x86_64 0:2.4.2-22.2.amzn2.0.2 2023-01-11T21:09:33.1712396Z patch.x86_64 0:2.7.1-12.amzn2.0.2 2023-01-11T21:09:33.1712811Z patchutils.x86_64 0:0.3.3-4.amzn2.0.1 2023-01-11T21:09:33.1713232Z rcs.x86_64 0:5.9.0-5.amzn2.0.2 2023-01-11T21:09:33.1713649Z rpm-build.x86_64 0:4.11.3-48.amzn2.0.2 2023-01-11T21:09:33.1714069Z rpm-sign.x86_64 0:4.11.3-48.amzn2.0.2 2023-01-11T21:09:33.1714499Z subversion.x86_64 0:1.7.14-16.amzn2.0.1 2023-01-11T21:09:33.1715279Z swig.x86_64 0:3.0.12-11.amzn2.0.3 2023-01-11T21:09:33.1716113Z system-rpm-config.noarch 0:9.1.0-76.amzn2.0.14 2023-01-11T21:09:33.1718079Z systemtap.x86_64 0:4.5-1.amzn2.0.1 2023-01-11T21:09:33.1718514Z 2023-01-11T21:09:33.1718760Z Dependency Installed: 2023-01-11T21:09:33.1719527Z apr.x86_64 0:1.7.0-9.amzn2 2023-01-11T21:09:33.1720448Z apr-util.x86_64 0:1.6.1-5.amzn2.0.2 2023-01-11T21:09:33.1721117Z apr-util-bdb.x86_64 0:1.6.1-5.amzn2.0.2 2023-01-11T21:09:33.1721549Z avahi-libs.x86_64 0:0.6.31-20.amzn2 2023-01-11T21:09:33.1721969Z cpp.x86_64 0:7.3.1-15.amzn2 2023-01-11T21:09:33.1722397Z dwz.x86_64 0:0.11-3.amzn2.0.3 2023-01-11T21:09:33.1722789Z efivar-libs.x86_64 0:31-4.amzn2.0.4 2023-01-11T21:09:33.1723235Z elfutils-libelf-devel.x86_64 0:0.176-2.amzn2 2023-01-11T21:09:33.1723691Z emacs-filesystem.noarch 1:27.2-4.amzn2.0.1 2023-01-11T21:09:33.1724121Z gdb.x86_64 0:8.0.1-36.amzn2.0.1 2023-01-11T21:09:33.1724546Z gettext-common-devel.noarch 0:0.19.8.1-3.amzn2 2023-01-11T21:09:33.1724994Z gettext-devel.x86_64 0:0.19.8.1-3.amzn2 2023-01-11T21:09:33.1725416Z glibc-devel.x86_64 0:2.26-62.amzn2 2023-01-11T21:09:33.1725944Z glibc-headers.x86_64 0:2.26-62.amzn2 2023-01-11T21:09:33.1726362Z gnutls.x86_64 0:3.3.29-9.amzn2.0.1 2023-01-11T21:09:33.1726789Z go-srpm-macros.noarch 0:3.0.15-23.amzn2.0.2 2023-01-11T21:09:33.1727233Z kernel-devel.x86_64 0:4.14.301-224.520.amzn2 2023-01-11T21:09:33.1727641Z kernel-headers.x86_64 0:4.14.301-224.520.amzn2 2023-01-11T21:09:33.1728059Z libatomic.x86_64 0:7.3.1-15.amzn2 2023-01-11T21:09:33.1728541Z libcilkrts.x86_64 0:7.3.1-15.amzn2 2023-01-11T21:09:33.1728954Z libgfortran.x86_64 0:7.3.1-15.amzn2 2023-01-11T21:09:33.1729363Z libitm.x86_64 0:7.3.1-15.amzn2 2023-01-11T21:09:33.1729770Z libmodman.x86_64 0:2.0.1-8.amzn2.0.2 2023-01-11T21:09:33.1730163Z libmpc.x86_64 0:1.0.1-3.amzn2.0.2 2023-01-11T21:09:33.1730569Z libmpx.x86_64 0:7.3.1-15.amzn2 2023-01-11T21:09:33.1730975Z libproxy.x86_64 0:0.4.11-10.amzn2.0.3 2023-01-11T21:09:33.1731381Z libquadmath.x86_64 0:7.3.1-15.amzn2 2023-01-11T21:09:33.1731779Z libsanitizer.x86_64 0:7.3.1-15.amzn2 2023-01-11T21:09:33.1732184Z m4.x86_64 0:1.4.16-10.amzn2.0.2 2023-01-11T21:09:33.1732597Z mokutil.x86_64 1:0.3.0-10.amzn2.0.1 2023-01-11T21:09:33.1732985Z mpfr.x86_64 0:3.1.1-4.amzn2.0.2 2023-01-11T21:09:33.1733390Z neon.x86_64 0:0.30.0-3.amzn2.0.2 2023-01-11T21:09:33.1733805Z pakchois.x86_64 0:0.4-10.amzn2.0.2 2023-01-11T21:09:33.1734233Z perl-Data-Dumper.x86_64 0:2.145-3.amzn2.0.2 2023-01-11T21:09:33.1734667Z perl-Test-Harness.noarch 0:3.28-3.amzn2 2023-01-11T21:09:33.1735122Z perl-Thread-Queue.noarch 0:3.02-2.amzn2 2023-01-11T21:09:33.1735578Z perl-XML-Parser.x86_64 0:2.41-10.amzn2.0.2 2023-01-11T21:09:33.1736013Z perl-srpm-macros.noarch 0:1-8.amzn2.0.1 2023-01-11T21:09:33.1736467Z subversion-libs.x86_64 0:1.7.14-16.amzn2.0.1 2023-01-11T21:09:33.1737563Z systemtap-client.x86_64 0:4.5-1.amzn2.0.1 2023-01-11T21:09:33.1738008Z systemtap-devel.x86_64 0:4.5-1.amzn2.0.1 2023-01-11T21:09:33.1738421Z trousers.x86_64 0:0.3.14-2.amzn2.0.2 2023-01-11T21:09:33.1738838Z zlib-devel.x86_64 0:1.2.7-19.amzn2.0.2 2023-01-11T21:09:33.1739038Z 2023-01-11T21:09:33.1739141Z Complete! 2023-01-11T21:09:33.2096490Z ++ uname -r 2023-01-11T21:09:33.2107291Z + sudo yum install -y 'kernel-devel-uname-r == 4.14.252-195.483.amzn2.x86_64' 2023-01-11T21:09:33.7615033Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd 2023-01-11T21:09:33.7725861Z Existing lock /var/run/yum.pid: another copy is running as pid 35629. 2023-01-11T21:09:33.7726272Z Another app is currently holding the yum lock; waiting for it to exit... 2023-01-11T21:09:33.7734852Z The other application is: yum 2023-01-11T21:09:33.7735147Z Memory : 92 M RSS (309 MB VSZ) 2023-01-11T21:09:33.7736286Z Started: Wed Jan 11 21:09:32 2023 - 00:01 ago 2023-01-11T21:09:33.7736848Z State : Running, pid: 35629 2023-01-11T21:09:35.7761861Z Another app is currently holding the yum lock; waiting for it to exit... 2023-01-11T21:09:35.7768529Z The other application is: yum 2023-01-11T21:09:35.7768827Z Memory : 169 M RSS (387 MB VSZ) 2023-01-11T21:09:35.7769588Z Started: Wed Jan 11 21:09:32 2023 - 00:03 ago 2023-01-11T21:09:35.7770047Z State : Running, pid: 35629 2023-01-11T21:09:38.0552229Z Resolving Dependencies 2023-01-11T21:09:38.0558445Z --> Running transaction check 2023-01-11T21:09:38.0559282Z ---> Package kernel-devel.x86_64 0:4.14.252-195.483.amzn2 will be installed 2023-01-11T21:09:38.3454934Z --> Finished Dependency Resolution 2023-01-11T21:09:38.4261129Z 2023-01-11T21:09:38.4261526Z Dependencies Resolved 2023-01-11T21:09:38.4267675Z 2023-01-11T21:09:38.4268295Z ================================================================================ 2023-01-11T21:09:38.4268681Z Package Arch Version Repository Size 2023-01-11T21:09:38.4269029Z ================================================================================ 2023-01-11T21:09:38.4269330Z Installing: 2023-01-11T21:09:38.4269809Z kernel-devel x86_64 4.14.252-195.483.amzn2 amzn2-core 13 M 2023-01-11T21:09:38.4270009Z 2023-01-11T21:09:38.4270128Z Transaction Summary 2023-01-11T21:09:38.4270418Z ================================================================================ 2023-01-11T21:09:38.4270689Z Install 1 Package 2023-01-11T21:09:38.4270847Z 2023-01-11T21:09:38.4270957Z Total download size: 13 M 2023-01-11T21:09:38.4271220Z Installed size: 60 M 2023-01-11T21:09:38.4271483Z Downloading packages: 2023-01-11T21:09:38.4280660Z Delta RPMs disabled because /usr/bin/applydeltarpm not installed. 2023-01-11T21:09:38.7233964Z Running transaction check 2023-01-11T21:09:38.7423850Z Running transaction test 2023-01-11T21:09:39.1437835Z Transaction test succeeded 2023-01-11T21:09:39.1440894Z Running transaction 2023-01-11T21:09:54.2788025Z Installing : kernel-devel-4.14.252-195.483.amzn2.x86_64 1/1 2023-01-11T21:09:54.3597682Z Verifying : kernel-devel-4.14.252-195.483.amzn2.x86_64 1/1 2023-01-11T21:09:54.3598307Z 2023-01-11T21:09:54.3598505Z Installed: 2023-01-11T21:09:54.3598997Z kernel-devel.x86_64 0:4.14.252-195.483.amzn2 2023-01-11T21:09:54.3599209Z 2023-01-11T21:09:54.3599319Z Complete! 2023-01-11T21:09:54.3915880Z + sudo modprobe backlight 2023-01-11T21:09:54.4113344Z + sudo curl -fsL -o /tmp/nvidia_driver https://s3.amazonaws.com/ossci-linux/nvidia_driver/NVIDIA-Linux-x86_64-515.76.run 2023-01-11T21:09:58.0924994Z + set +e 2023-01-11T21:09:58.0925893Z + sudo /bin/bash /tmp/nvidia_driver -s --no-drm 2023-01-11T21:09:59.4763044Z Verifying archive integrity... OK 2023-01-11T21:10:26.3790341Z Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 515.76................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ 2023-01-11T21:10:26.5206838Z 2023-01-11T21:10:26.5207762Z WARNING: The nvidia-drm module will not be installed. As a result, DRM-KMS will not function with this installation of the NVIDIA driver. 2023-01-11T21:10:26.5208319Z 2023-01-11T21:10:39.5383885Z 2023-01-11T21:10:39.5385265Z WARNING: nvidia-installer was forced to guess the X library path '/usr/lib64' and X module path '/usr/lib64/xorg/modules'; these paths were not queryable from the system. If X fails to find the NVIDIA X driver module, please install the `pkg-config` utility and the X.Org SDK/development package for your distribution and reinstall the driver. 2023-01-11T21:10:39.5385868Z 2023-01-11T21:10:46.9994266Z + NVIDIA_INSTALLATION_STATUS=0 2023-01-11T21:10:46.9994599Z + RESET_GPU=0 2023-01-11T21:10:46.9997695Z + '[' 0 -ne 0 ']' 2023-01-11T21:10:46.9998574Z ++ command -v nvidia-smi 2023-01-11T21:10:46.9999458Z + '[' -x /usr/bin/nvidia-smi ']' 2023-01-11T21:10:47.0003445Z ++ nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0 2023-01-11T21:10:52.0621676Z + INSTALLED_DRIVER_VERSION=515.76 2023-01-11T21:10:52.0622028Z + NVIDIA_SMI_STATUS=0 2023-01-11T21:10:52.0622435Z + '[' 0 -ne 0 ']' 2023-01-11T21:10:52.0622694Z + '[' 0 -eq 1 ']' 2023-01-11T21:10:52.0625964Z + sudo rm -fv /tmp/nvidia_driver 2023-01-11T21:10:52.1232562Z removed ‘/tmp/nvidia_driver’ 2023-01-11T21:10:52.1248302Z + set -e 2023-01-11T21:10:52.1250534Z + post_install_nvidia_driver_common 2023-01-11T21:10:52.1253857Z + sudo modprobe nvidia 2023-01-11T21:10:52.1374083Z + echo 'After installing NVIDIA driver' 2023-01-11T21:10:52.1374511Z + lspci 2023-01-11T21:10:52.1374751Z After installing NVIDIA driver 2023-01-11T21:10:52.1561327Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) 2023-01-11T21:10:52.1561764Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 2023-01-11T21:10:52.1562256Z 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] 2023-01-11T21:10:52.1562642Z 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01) 2023-01-11T21:10:52.1563012Z 00:02.0 VGA compatible controller: Cirrus Logic GD 5446 2023-01-11T21:10:52.1563396Z 00:03.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA) 2023-01-11T21:10:52.1563837Z 00:1d.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1) 2023-01-11T21:10:52.1564242Z 00:1e.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1) 2023-01-11T21:10:52.1564653Z 00:1f.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01) 2023-01-11T21:10:52.1564965Z + lsmod 2023-01-11T21:10:52.1583857Z Module Size Used by 2023-01-11T21:10:52.1584139Z nvidia 40808448 0 2023-01-11T21:10:52.1584426Z drm 425984 1 nvidia 2023-01-11T21:10:52.1584753Z i2c_core 77824 2 nvidia,drm 2023-01-11T21:10:52.1585028Z backlight 16384 0 2023-01-11T21:10:52.1585308Z xt_conntrack 16384 1 2023-01-11T21:10:52.1585576Z ipt_MASQUERADE 16384 1 2023-01-11T21:10:52.1585857Z nf_nat_masquerade_ipv4 16384 1 ipt_MASQUERADE 2023-01-11T21:10:52.1586156Z nf_conntrack_netlink 49152 0 2023-01-11T21:10:52.1586458Z nfnetlink 16384 2 nf_conntrack_netlink 2023-01-11T21:10:52.1586728Z xfrm_user 45056 1 2023-01-11T21:10:52.1587018Z xfrm_algo 16384 1 xfrm_user 2023-01-11T21:10:52.1587296Z xt_addrtype 16384 2 2023-01-11T21:10:52.1587546Z iptable_filter 16384 1 2023-01-11T21:10:52.1587803Z iptable_nat 16384 1 2023-01-11T21:10:52.1588071Z nf_conntrack_ipv4 16384 3 2023-01-11T21:10:52.1588348Z nf_defrag_ipv4 16384 1 nf_conntrack_ipv4 2023-01-11T21:10:52.1588654Z nf_nat_ipv4 16384 1 iptable_nat 2023-01-11T21:10:52.1588973Z nf_nat 36864 2 nf_nat_masquerade_ipv4,nf_nat_ipv4 2023-01-11T21:10:52.1589444Z nf_conntrack 155648 7 xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_ipv4,nf_nat,ipt_MASQUERADE,nf_nat_ipv4,nf_conntrack_netlink 2023-01-11T21:10:52.1589835Z br_netfilter 24576 0 2023-01-11T21:10:52.1590115Z bridge 172032 1 br_netfilter 2023-01-11T21:10:52.1590395Z stp 16384 1 bridge 2023-01-11T21:10:52.1590871Z llc 16384 2 bridge,stp 2023-01-11T21:10:52.1591136Z overlay 86016 0 2023-01-11T21:10:52.1591393Z sunrpc 393216 1 2023-01-11T21:10:52.1591631Z dm_mirror 28672 0 2023-01-11T21:10:52.1591901Z dm_region_hash 20480 1 dm_mirror 2023-01-11T21:10:52.1592220Z dm_log 20480 2 dm_region_hash,dm_mirror 2023-01-11T21:10:52.1592521Z dm_mod 143360 2 dm_log,dm_mirror 2023-01-11T21:10:52.1592781Z dax 69632 1 dm_mod 2023-01-11T21:10:52.1593048Z sb_edac 24576 0 2023-01-11T21:10:52.1593308Z crc32_pclmul 16384 0 2023-01-11T21:10:52.1593680Z ghash_clmulni_intel 16384 0 2023-01-11T21:10:52.1593962Z pcbc 16384 0 2023-01-11T21:10:52.1594218Z aesni_intel 188416 0 2023-01-11T21:10:52.1594470Z aes_x86_64 20480 1 aesni_intel 2023-01-11T21:10:52.1594735Z ata_piix 36864 0 2023-01-11T21:10:52.1595422Z crypto_simd 16384 1 aesni_intel 2023-01-11T21:10:52.1595935Z glue_helper 16384 1 aesni_intel 2023-01-11T21:10:52.1596642Z cryptd 28672 3 crypto_simd,ghash_clmulni_intel,aesni_intel 2023-01-11T21:10:52.1597266Z pcc_cpufreq 16384 0 2023-01-11T21:10:52.1597806Z libata 266240 1 ata_piix 2023-01-11T21:10:52.1598401Z mousedev 24576 0 2023-01-11T21:10:52.1598951Z evdev 20480 3 2023-01-11T21:10:52.1599261Z scsi_mod 245760 1 libata 2023-01-11T21:10:52.1599529Z psmouse 32768 0 2023-01-11T21:10:52.1599783Z button 16384 0 2023-01-11T21:10:52.1600040Z ena 114688 0 2023-01-11T21:10:52.1600276Z xen_blkfront 49152 2 2023-01-11T21:10:52.1600532Z crc32c_intel 24576 0 2023-01-11T21:10:52.1600780Z autofs4 49152 2 2023-01-11T21:10:52.1601005Z + modinfo nvidia 2023-01-11T21:10:52.1601491Z filename: /lib/modules/4.14.252-195.483.amzn2.x86_64/kernel/drivers/video/nvidia.ko 2023-01-11T21:10:52.1601841Z firmware: nvidia/515.76/gsp.bin 2023-01-11T21:10:52.1602158Z alias: char-major-195-* 2023-01-11T21:10:52.1602424Z version: 515.76 2023-01-11T21:10:52.1602686Z supported: external 2023-01-11T21:10:52.1602923Z license: NVIDIA 2023-01-11T21:10:52.1603335Z srcversion: 51FD9DD90150B35351AFFBB 2023-01-11T21:10:52.1603697Z alias: pci:v000010DEd*sv*sd*bc06sc80i00* 2023-01-11T21:10:52.1604228Z alias: pci:v000010DEd*sv*sd*bc03sc02i00* 2023-01-11T21:10:52.1604739Z alias: pci:v000010DEd*sv*sd*bc03sc00i00* 2023-01-11T21:10:52.1605084Z depends: i2c-core,drm 2023-01-11T21:10:52.1605335Z retpoline: Y 2023-01-11T21:10:52.1605584Z name: nvidia 2023-01-11T21:10:52.1605981Z vermagic: 4.14.252-195.483.amzn2.x86_64 SMP mod_unload modversions 2023-01-11T21:10:52.1606358Z parm: NvSwitchRegDwords:NvSwitch regkey (charp) 2023-01-11T21:10:52.1606735Z parm: NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp) 2023-01-11T21:10:52.1607090Z parm: NVreg_ResmanDebugLevel:int 2023-01-11T21:10:52.1607384Z parm: NVreg_RmLogonRC:int 2023-01-11T21:10:52.1607668Z parm: NVreg_ModifyDeviceFiles:int 2023-01-11T21:10:52.1607973Z parm: NVreg_DeviceFileUID:int 2023-01-11T21:10:52.1608270Z parm: NVreg_DeviceFileGID:int 2023-01-11T21:10:52.1608549Z parm: NVreg_DeviceFileMode:int 2023-01-11T21:10:52.1608935Z parm: NVreg_InitializeSystemMemoryAllocations:int 2023-01-11T21:10:52.1609306Z parm: NVreg_UsePageAttributeTable:int 2023-01-11T21:10:52.1609629Z parm: NVreg_EnablePCIeGen3:int 2023-01-11T21:10:52.1609901Z parm: NVreg_EnableMSI:int 2023-01-11T21:10:52.1610187Z parm: NVreg_TCEBypassMode:int 2023-01-11T21:10:52.1610498Z parm: NVreg_EnableStreamMemOPs:int 2023-01-11T21:10:52.1610834Z parm: NVreg_RestrictProfilingToAdminUsers:int 2023-01-11T21:10:52.1611358Z parm: NVreg_PreserveVideoMemoryAllocations:int 2023-01-11T21:10:52.1611728Z parm: NVreg_EnableS0ixPowerManagement:int 2023-01-11T21:10:52.1612110Z parm: NVreg_S0ixPowerManagementVideoMemoryThreshold:int 2023-01-11T21:10:52.1612500Z parm: NVreg_DynamicPowerManagement:int 2023-01-11T21:10:52.1612906Z parm: NVreg_DynamicPowerManagementVideoMemoryThreshold:int 2023-01-11T21:10:52.1613280Z parm: NVreg_EnableGpuFirmware:int 2023-01-11T21:10:52.1613610Z parm: NVreg_EnableGpuFirmwareLogs:int 2023-01-11T21:10:52.1613971Z parm: NVreg_OpenRmEnableUnsupportedGpus:int 2023-01-11T21:10:52.1614394Z parm: NVreg_EnableUserNUMAManagement:int 2023-01-11T21:10:52.1614712Z parm: NVreg_MemoryPoolSize:int 2023-01-11T21:10:52.1615028Z parm: NVreg_KMallocHeapMaxSize:int 2023-01-11T21:10:52.1615347Z parm: NVreg_VMallocHeapMaxSize:int 2023-01-11T21:10:52.1615638Z parm: NVreg_IgnoreMMIOCheck:int 2023-01-11T21:10:52.1615941Z parm: NVreg_NvLinkDisable:int 2023-01-11T21:10:52.1616289Z parm: NVreg_EnablePCIERelaxedOrderingMode:int 2023-01-11T21:10:52.1617238Z parm: NVreg_RegisterPCIDriver:int 2023-01-11T21:10:52.1617586Z parm: NVreg_EnableDbgBreakpoint:int 2023-01-11T21:10:52.1617906Z parm: NVreg_RegistryDwords:charp 2023-01-11T21:10:52.1618222Z parm: NVreg_RegistryDwordsPerDevice:charp 2023-01-11T21:10:52.1618540Z parm: NVreg_RmMsg:charp 2023-01-11T21:10:52.1618828Z parm: NVreg_GpuBlacklist:charp 2023-01-11T21:10:52.1619122Z parm: NVreg_TemporaryFilePath:charp 2023-01-11T21:10:52.1619444Z parm: NVreg_ExcludedGpus:charp 2023-01-11T21:10:52.1619755Z parm: NVreg_DmaRemapPeerMmio:int 2023-01-11T21:10:52.1620057Z parm: rm_firmware_active:charp 2023-01-11T21:10:52.1620298Z + set +e 2023-01-11T21:10:52.1620573Z + nvidia-smi 2023-01-11T21:10:55.9590710Z Wed Jan 11 21:10:55 2023 2023-01-11T21:10:55.9591293Z +-----------------------------------------------------------------------------+ 2023-01-11T21:10:55.9591817Z | NVIDIA-SMI 515.76 Driver Version: 515.76 CUDA Version: 11.7 | 2023-01-11T21:10:55.9592278Z |-------------------------------+----------------------+----------------------+ 2023-01-11T21:10:55.9592792Z | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | 2023-01-11T21:10:55.9593284Z | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | 2023-01-11T21:10:55.9593635Z | | | MIG M. | 2023-01-11T21:10:55.9593920Z |===============================+======================+======================| 2023-01-11T21:10:55.9637264Z | 0 Tesla M60 Off | 00000000:00:1D.0 Off | 0 | 2023-01-11T21:10:55.9637637Z | N/A 31C P0 36W / 150W | 0MiB / 7680MiB | 0% Default | 2023-01-11T21:10:55.9637952Z | | | N/A | 2023-01-11T21:10:55.9638393Z +-------------------------------+----------------------+----------------------+ 2023-01-11T21:10:55.9684086Z | 1 Tesla M60 Off | 00000000:00:1E.0 Off | 0 | 2023-01-11T21:10:55.9684423Z | N/A 23C P0 37W / 150W | 0MiB / 7680MiB | 98% Default | 2023-01-11T21:10:55.9684753Z | | | N/A | 2023-01-11T21:10:55.9685782Z +-------------------------------+----------------------+----------------------+ 2023-01-11T21:10:55.9686215Z 2023-01-11T21:10:55.9686666Z +-----------------------------------------------------------------------------+ 2023-01-11T21:10:55.9687045Z | Processes: | 2023-01-11T21:10:55.9687381Z | GPU GI CI PID Type Process name GPU Memory | 2023-01-11T21:10:55.9687968Z | ID ID Usage | 2023-01-11T21:10:55.9688267Z |=============================================================================| 2023-01-11T21:10:55.9689288Z | No running processes found | 2023-01-11T21:10:55.9690377Z +-----------------------------------------------------------------------------+ 2023-01-11T21:10:56.5053645Z + NVIDIA_SMI_STATUS=0 2023-01-11T21:10:56.5054122Z + '[' 0 -eq 0 ']' 2023-01-11T21:10:56.5054553Z + echo 'INFO: Ignoring allowed status 0' 2023-01-11T21:10:56.5054893Z + set -e 2023-01-11T21:10:56.5055396Z INFO: Ignoring allowed status 0 2023-01-11T21:10:56.5061488Z == Installing nvidia container toolkit for amzn2 == 2023-01-11T21:10:56.5065358Z + sudo yum install -y yum-utils 2023-01-11T21:10:57.0456035Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd 2023-01-11T21:10:57.3196894Z Package yum-utils-1.1.31-46.amzn2.0.1.noarch already installed and latest version 2023-01-11T21:10:57.3197385Z Nothing to do 2023-01-11T21:10:57.3400632Z + sudo yum-config-manager --add-repo https://nvidia.github.io/nvidia-docker/amzn2/nvidia-docker.repo 2023-01-11T21:10:57.8659133Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd 2023-01-11T21:10:57.8953255Z adding repo from: https://nvidia.github.io/nvidia-docker/amzn2/nvidia-docker.repo 2023-01-11T21:10:57.8954029Z grabbing file https://nvidia.github.io/nvidia-docker/amzn2/nvidia-docker.repo to /etc/yum.repos.d/nvidia-docker.repo 2023-01-11T21:10:57.8954637Z repo saved to /etc/yum.repos.d/nvidia-docker.repo 2023-01-11T21:10:57.9098484Z + sudo yum install -y nvidia-docker2 2023-01-11T21:10:58.4318210Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd 2023-01-11T21:10:58.4742452Z Retrieving key from https://nvidia.github.io/libnvidia-container/gpgkey 2023-01-11T21:10:58.4842278Z Importing GPG key 0xF796ECB0: 2023-01-11T21:10:58.4842779Z Userid : "NVIDIA CORPORATION (Open Source Projects) " 2023-01-11T21:10:58.4843372Z Fingerprint: c95b 321b 61e8 8c18 09c4 f759 ddca e044 f796 ecb0 2023-01-11T21:10:58.4843930Z From : https://nvidia.github.io/libnvidia-container/gpgkey 2023-01-11T21:10:58.8720420Z Retrieving key from https://nvidia.github.io/nvidia-container-runtime/gpgkey 2023-01-11T21:10:58.8825340Z Importing GPG key 0xF796ECB0: 2023-01-11T21:10:58.8825826Z Userid : "NVIDIA CORPORATION (Open Source Projects) " 2023-01-11T21:10:58.8826323Z Fingerprint: c95b 321b 61e8 8c18 09c4 f759 ddca e044 f796 ecb0 2023-01-11T21:10:58.8826855Z From : https://nvidia.github.io/nvidia-container-runtime/gpgkey 2023-01-11T21:10:59.1042309Z Retrieving key from https://nvidia.github.io/nvidia-docker/gpgkey 2023-01-11T21:10:59.1147621Z Importing GPG key 0xF796ECB0: 2023-01-11T21:10:59.1148104Z Userid : "NVIDIA CORPORATION (Open Source Projects) " 2023-01-11T21:10:59.1148589Z Fingerprint: c95b 321b 61e8 8c18 09c4 f759 ddca e044 f796 ecb0 2023-01-11T21:10:59.1149185Z From : https://nvidia.github.io/nvidia-docker/gpgkey 2023-01-11T21:11:00.8113955Z Resolving Dependencies 2023-01-11T21:11:00.8118741Z --> Running transaction check 2023-01-11T21:11:00.8119226Z ---> Package nvidia-docker2.noarch 0:2.11.0-1 will be installed 2023-01-11T21:11:00.8144932Z --> Processing Dependency: nvidia-container-toolkit >= 1.10.0-1 for package: nvidia-docker2-2.11.0-1.noarch 2023-01-11T21:11:00.9027422Z --> Running transaction check 2023-01-11T21:11:00.9028025Z ---> Package nvidia-container-toolkit.x86_64 0:1.11.0-1 will be installed 2023-01-11T21:11:00.9171628Z --> Processing Dependency: nvidia-container-toolkit-base = 1.11.0-1 for package: nvidia-container-toolkit-1.11.0-1.x86_64 2023-01-11T21:11:00.9182686Z --> Processing Dependency: libnvidia-container-tools < 2.0.0 for package: nvidia-container-toolkit-1.11.0-1.x86_64 2023-01-11T21:11:00.9310907Z --> Processing Dependency: libnvidia-container-tools >= 1.11.0-1 for package: nvidia-container-toolkit-1.11.0-1.x86_64 2023-01-11T21:11:00.9311717Z --> Running transaction check 2023-01-11T21:11:00.9312389Z ---> Package libnvidia-container-tools.x86_64 0:1.11.0-1 will be installed 2023-01-11T21:11:00.9322531Z --> Processing Dependency: libnvidia-container1(x86-64) >= 1.11.0-1 for package: libnvidia-container-tools-1.11.0-1.x86_64 2023-01-11T21:11:00.9350337Z --> Processing Dependency: libnvidia-container.so.1(NVC_1.0)(64bit) for package: libnvidia-container-tools-1.11.0-1.x86_64 2023-01-11T21:11:00.9351143Z --> Processing Dependency: libnvidia-container.so.1()(64bit) for package: libnvidia-container-tools-1.11.0-1.x86_64 2023-01-11T21:11:00.9352582Z ---> Package nvidia-container-toolkit-base.x86_64 0:1.11.0-1 will be installed 2023-01-11T21:11:00.9354340Z --> Running transaction check 2023-01-11T21:11:00.9354999Z ---> Package libnvidia-container1.x86_64 0:1.11.0-1 will be installed 2023-01-11T21:11:01.1787068Z --> Finished Dependency Resolution 2023-01-11T21:11:01.2536408Z 2023-01-11T21:11:01.2537121Z Dependencies Resolved 2023-01-11T21:11:01.2550200Z 2023-01-11T21:11:01.2550530Z ================================================================================ 2023-01-11T21:11:01.2550900Z Package Arch Version Repository Size 2023-01-11T21:11:01.2551264Z ================================================================================ 2023-01-11T21:11:01.2551527Z Installing: 2023-01-11T21:11:01.2552023Z nvidia-docker2 noarch 2.11.0-1 libnvidia-container 8.7 k 2023-01-11T21:11:01.2552376Z Installing for dependencies: 2023-01-11T21:11:01.2552888Z libnvidia-container-tools x86_64 1.11.0-1 libnvidia-container 49 k 2023-01-11T21:11:01.2553384Z libnvidia-container1 x86_64 1.11.0-1 libnvidia-container 1.0 M 2023-01-11T21:11:01.2553909Z nvidia-container-toolkit x86_64 1.11.0-1 libnvidia-container 780 k 2023-01-11T21:11:01.2554439Z nvidia-container-toolkit-base x86_64 1.11.0-1 libnvidia-container 2.5 M 2023-01-11T21:11:01.2554716Z 2023-01-11T21:11:01.2554827Z Transaction Summary 2023-01-11T21:11:01.2555122Z ================================================================================ 2023-01-11T21:11:01.2555423Z Install 1 Package (+4 Dependent packages) 2023-01-11T21:11:01.2555619Z 2023-01-11T21:11:01.2555745Z Total download size: 4.3 M 2023-01-11T21:11:01.2556013Z Installed size: 12 M 2023-01-11T21:11:01.2556253Z Downloading packages: 2023-01-11T21:11:01.3693366Z -------------------------------------------------------------------------------- 2023-01-11T21:11:01.3694103Z Total 38 MB/s | 4.3 MB 00:00 2023-01-11T21:11:01.3740271Z Running transaction check 2023-01-11T21:11:01.3912544Z Running transaction test 2023-01-11T21:11:01.4076367Z Transaction test succeeded 2023-01-11T21:11:01.4079695Z Running transaction 2023-01-11T21:11:01.8938493Z Installing : nvidia-container-toolkit-base-1.11.0-1.x86_64 1/5 2023-01-11T21:11:01.9265333Z Installing : libnvidia-container1-1.11.0-1.x86_64 2/5 2023-01-11T21:11:02.0320614Z Installing : libnvidia-container-tools-1.11.0-1.x86_64 3/5 2023-01-11T21:11:02.0539932Z Installing : nvidia-container-toolkit-1.11.0-1.x86_64 4/5 2023-01-11T21:11:02.0916352Z Installing : nvidia-docker2-2.11.0-1.noarch 5/5 2023-01-11T21:11:02.1024656Z Verifying : libnvidia-container1-1.11.0-1.x86_64 1/5 2023-01-11T21:11:02.1133234Z Verifying : nvidia-container-toolkit-base-1.11.0-1.x86_64 2/5 2023-01-11T21:11:02.1225477Z Verifying : nvidia-container-toolkit-1.11.0-1.x86_64 3/5 2023-01-11T21:11:02.1320214Z Verifying : libnvidia-container-tools-1.11.0-1.x86_64 4/5 2023-01-11T21:11:02.2030040Z Verifying : nvidia-docker2-2.11.0-1.noarch 5/5 2023-01-11T21:11:02.2030803Z 2023-01-11T21:11:02.2030925Z Installed: 2023-01-11T21:11:02.2031327Z nvidia-docker2.noarch 0:2.11.0-1 2023-01-11T21:11:02.2031547Z 2023-01-11T21:11:02.2031676Z Dependency Installed: 2023-01-11T21:11:02.2032103Z libnvidia-container-tools.x86_64 0:1.11.0-1 2023-01-11T21:11:02.2032557Z libnvidia-container1.x86_64 0:1.11.0-1 2023-01-11T21:11:02.2033014Z nvidia-container-toolkit.x86_64 0:1.11.0-1 2023-01-11T21:11:02.2033501Z nvidia-container-toolkit-base.x86_64 0:1.11.0-1 2023-01-11T21:11:02.2033862Z 2023-01-11T21:11:02.2033982Z Complete! 2023-01-11T21:11:02.3084984Z + sudo systemctl restart docker 2023-01-11T21:11:10.8964205Z Command completed after 1 attempt(s). 2023-01-11T21:11:10.9024437Z ##[group]Run python3 -m pip install psutil==5.9.1 2023-01-11T21:11:10.9024842Z python3 -m pip install psutil==5.9.1 2023-01-11T21:11:10.9025184Z python3 -m pip install pynvml==11.4.1 2023-01-11T21:11:10.9025540Z python3 -m tools.stats.monitor > usage_log.txt 2>&1 & 2023-01-11T21:11:10.9025902Z echo "monitor-script-pid=${!}" >> "${GITHUB_OUTPUT}" 2023-01-11T21:11:10.9039186Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T21:11:10.9039492Z env: 2023-01-11T21:11:10.9039718Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:11:10.9039988Z GPU_FLAG: --gpus all 2023-01-11T21:11:10.9040245Z ##[endgroup] 2023-01-11T21:11:11.4187530Z Defaulting to user installation because normal site-packages is not writeable 2023-01-11T21:11:11.8286391Z Collecting psutil==5.9.1 2023-01-11T21:11:11.8526730Z Downloading psutil-5.9.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (281 kB) 2023-01-11T21:11:11.9193900Z Installing collected packages: psutil 2023-01-11T21:11:12.0737543Z Successfully installed psutil-5.9.1 2023-01-11T21:11:12.5404454Z Defaulting to user installation because normal site-packages is not writeable 2023-01-11T21:11:12.6096603Z Collecting pynvml==11.4.1 2023-01-11T21:11:12.6276055Z Downloading pynvml-11.4.1-py3-none-any.whl (46 kB) 2023-01-11T21:11:12.6764990Z Installing collected packages: pynvml 2023-01-11T21:11:12.7302194Z Successfully installed pynvml-11.4.1 2023-01-11T21:11:12.7774831Z Prepare all required actions 2023-01-11T21:11:12.7775193Z Getting action download info 2023-01-11T21:11:13.0480876Z Download action repository 'seemethere/download-artifact-s3@v4' (SHA:4a8bfae15cc25cc0785c1603ee87a9da8fd442ea) 2023-01-11T21:11:13.3350906Z Download action repository 'actions/download-artifact@v3' (SHA:9bc31d5ccc31df68ecc42ccf4149144866c47d8a) 2023-01-11T21:11:13.5395624Z ##[group]Run ./.github/actions/download-build-artifacts 2023-01-11T21:11:13.5395929Z with: 2023-01-11T21:11:13.5396194Z name: linux-bionic-cuda11.6-py3.10-gcc7 2023-01-11T21:11:13.5396475Z env: 2023-01-11T21:11:13.5396725Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:11:13.5396977Z GPU_FLAG: --gpus all 2023-01-11T21:11:13.5397229Z ##[endgroup] 2023-01-11T21:11:13.5432460Z ##[group]Run seemethere/download-artifact-s3@v4 2023-01-11T21:11:13.5432759Z with: 2023-01-11T21:11:13.5433043Z name: linux-bionic-cuda11.6-py3.10-gcc7 2023-01-11T21:11:13.5433335Z s3-bucket: gha-artifacts 2023-01-11T21:11:13.5433669Z region: us-east-1 2023-01-11T21:11:13.5433890Z env: 2023-01-11T21:11:13.5434135Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:11:13.5434404Z GPU_FLAG: --gpus all 2023-01-11T21:11:13.5434633Z ##[endgroup] 2023-01-11T21:11:14.1356222Z Found 1 objects with prefix pytorch/pytorch/3896099317/linux-bionic-cuda11.6-py3.10-gcc7/ 2023-01-11T21:11:14.1356829Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip 2023-01-11T21:11:20.5504902Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip 2023-01-11T21:11:20.5505470Z 2023-01-11T21:11:20.5529137Z ##[warning]The `set-output` command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/ 2023-01-11T21:11:20.5540144Z Artifact download has finished successfully 2023-01-11T21:11:20.5797402Z ##[group]Run unzip -o artifacts.zip 2023-01-11T21:11:20.5797710Z unzip -o artifacts.zip 2023-01-11T21:11:20.5811223Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T21:11:20.5811522Z env: 2023-01-11T21:11:20.5811764Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:11:20.5812016Z GPU_FLAG: --gpus all 2023-01-11T21:11:20.5812264Z ##[endgroup] 2023-01-11T21:11:20.5890924Z Archive: artifacts.zip 2023-01-11T21:11:20.5892816Z creating: dist/ 2023-01-11T21:11:22.6791191Z inflating: dist/torch-2.0.0a0+git8419ddd-cp310-cp310-linux_x86_64.whl 2023-01-11T21:11:22.6791683Z creating: build/custom_test_artifacts/ 2023-01-11T21:11:22.6792149Z creating: build/custom_test_artifacts/custom-op-build/ 2023-01-11T21:11:22.6792605Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/ 2023-01-11T21:11:22.6799056Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeOutput.log 2023-01-11T21:11:22.6800177Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/ 2023-01-11T21:11:22.6801269Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeSystem.cmake 2023-01-11T21:11:22.6801819Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdC/ 2023-01-11T21:11:22.6802375Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdC/tmp/ 2023-01-11T21:11:22.6803745Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdC/CMakeCCompilerId.c 2023-01-11T21:11:22.6805417Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdC/a.out 2023-01-11T21:11:22.6806537Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCXX/ 2023-01-11T21:11:22.6807117Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCXX/tmp/ 2023-01-11T21:11:22.6808509Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCXX/CMakeCXXCompilerId.cpp 2023-01-11T21:11:22.6810369Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCXX/a.out 2023-01-11T21:11:22.6812344Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_C.bin 2023-01-11T21:11:22.6813677Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeCCompiler.cmake 2023-01-11T21:11:22.6815051Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CXX.bin 2023-01-11T21:11:22.6816213Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeCXXCompiler.cmake 2023-01-11T21:11:22.6817103Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/ 2023-01-11T21:11:22.6817662Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/ 2023-01-11T21:11:22.6870565Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2023-01-11T21:11:22.6872164Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2023-01-11T21:11:22.6873769Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2023-01-11T21:11:22.6875410Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2023-01-11T21:11:22.6876743Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2023-01-11T21:11:22.6877647Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2023-01-11T21:11:22.6878331Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2023-01-11T21:11:22.6879002Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2023-01-11T21:11:22.6879700Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2023-01-11T21:11:22.6918757Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2023-01-11T21:11:22.6959957Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2023-01-11T21:11:22.6961477Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2023-01-11T21:11:22.6962997Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2023-01-11T21:11:22.6964394Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.reg.c 2023-01-11T21:11:22.6965554Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin 2023-01-11T21:11:22.6966700Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2023-01-11T21:11:22.6967649Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.o 2023-01-11T21:11:22.6968285Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/CMakeCUDACompilerId.cu 2023-01-11T21:11:22.7040064Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/a.out 2023-01-11T21:11:22.7112869Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CUDA.bin 2023-01-11T21:11:22.7114305Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeCUDACompiler.cmake 2023-01-11T21:11:22.7115478Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeTmp/ 2023-01-11T21:11:22.7116879Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeError.log 2023-01-11T21:11:22.7118192Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/cmake.check_cache 2023-01-11T21:11:22.7119379Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/ 2023-01-11T21:11:22.7120744Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.ts 2023-01-11T21:11:22.7122154Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.make 2023-01-11T21:11:22.7122929Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/depend.make 2023-01-11T21:11:22.7123503Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.txt 2023-01-11T21:11:22.7124078Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/cmake_clean.cmake 2023-01-11T21:11:22.7124663Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/build.make 2023-01-11T21:11:22.7125231Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/DependInfo.cmake 2023-01-11T21:11:22.7125815Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/flags.make 2023-01-11T21:11:22.7126387Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/progress.make 2023-01-11T21:11:22.7143675Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o.d 2023-01-11T21:11:22.7260817Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o 2023-01-11T21:11:22.7262308Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/ 2023-01-11T21:11:22.7263611Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.ts 2023-01-11T21:11:22.7264970Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.make 2023-01-11T21:11:22.7266345Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/depend.make 2023-01-11T21:11:22.7267649Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.txt 2023-01-11T21:11:22.7268270Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/cmake_clean.cmake 2023-01-11T21:11:22.7268865Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/build.make 2023-01-11T21:11:22.7269460Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/DependInfo.cmake 2023-01-11T21:11:22.7270075Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/flags.make 2023-01-11T21:11:22.7270669Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/progress.make 2023-01-11T21:11:22.7288591Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o.d 2023-01-11T21:11:22.7375281Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o 2023-01-11T21:11:22.7377013Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeDirectoryInformation.cmake 2023-01-11T21:11:22.7378373Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/TargetDirectories.txt 2023-01-11T21:11:22.7379653Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/progress.marks 2023-01-11T21:11:22.7380424Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile2 2023-01-11T21:11:22.7380951Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile.cmake 2023-01-11T21:11:22.7381535Z inflating: build/custom_test_artifacts/custom-op-build/detect_cuda_version.cc 2023-01-11T21:11:22.7382796Z inflating: build/custom_test_artifacts/custom-op-build/CMakeCache.txt 2023-01-11T21:11:22.7383926Z inflating: build/custom_test_artifacts/custom-op-build/Makefile 2023-01-11T21:11:22.7384808Z inflating: build/custom_test_artifacts/custom-op-build/cmake_install.cmake 2023-01-11T21:11:22.7478878Z inflating: build/custom_test_artifacts/custom-op-build/libcustom_ops.so 2023-01-11T21:11:22.7543772Z inflating: build/custom_test_artifacts/custom-op-build/test_custom_ops 2023-01-11T21:11:22.7544806Z creating: build/custom_test_artifacts/jit-hook-build/ 2023-01-11T21:11:22.7545297Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/ 2023-01-11T21:11:22.7551459Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeOutput.log 2023-01-11T21:11:22.7552626Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/ 2023-01-11T21:11:22.7553371Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeSystem.cmake 2023-01-11T21:11:22.7553917Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdC/ 2023-01-11T21:11:22.7554435Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdC/tmp/ 2023-01-11T21:11:22.7555991Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdC/CMakeCCompilerId.c 2023-01-11T21:11:22.7557281Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdC/a.out 2023-01-11T21:11:22.7558206Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCXX/ 2023-01-11T21:11:22.7558762Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCXX/tmp/ 2023-01-11T21:11:22.7560283Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCXX/CMakeCXXCompilerId.cpp 2023-01-11T21:11:22.7562018Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCXX/a.out 2023-01-11T21:11:22.7563638Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_C.bin 2023-01-11T21:11:22.7564877Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeCCompiler.cmake 2023-01-11T21:11:22.7566281Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CXX.bin 2023-01-11T21:11:22.7567609Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeCXXCompiler.cmake 2023-01-11T21:11:22.7568189Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/ 2023-01-11T21:11:22.7568747Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/ 2023-01-11T21:11:22.7622328Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2023-01-11T21:11:22.7623906Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2023-01-11T21:11:22.7625499Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2023-01-11T21:11:22.7627158Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2023-01-11T21:11:22.7628525Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2023-01-11T21:11:22.7629217Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2023-01-11T21:11:22.7629882Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2023-01-11T21:11:22.7630552Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2023-01-11T21:11:22.7631363Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2023-01-11T21:11:22.7670260Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2023-01-11T21:11:22.7711408Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2023-01-11T21:11:22.7712938Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2023-01-11T21:11:22.7714449Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2023-01-11T21:11:22.7715821Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.reg.c 2023-01-11T21:11:22.7717006Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin 2023-01-11T21:11:22.7718280Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2023-01-11T21:11:22.7719165Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.o 2023-01-11T21:11:22.7719790Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/CMakeCUDACompilerId.cu 2023-01-11T21:11:22.7791393Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/a.out 2023-01-11T21:11:22.7864451Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CUDA.bin 2023-01-11T21:11:22.7865869Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeCUDACompiler.cmake 2023-01-11T21:11:22.7867266Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeTmp/ 2023-01-11T21:11:22.7868411Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeError.log 2023-01-11T21:11:22.7869599Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/cmake.check_cache 2023-01-11T21:11:22.7870800Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/ 2023-01-11T21:11:22.7872075Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.ts 2023-01-11T21:11:22.7873450Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.make 2023-01-11T21:11:22.7874282Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/depend.make 2023-01-11T21:11:22.7874856Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.txt 2023-01-11T21:11:22.7875441Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/cmake_clean.cmake 2023-01-11T21:11:22.7876002Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/build.make 2023-01-11T21:11:22.7876597Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/DependInfo.cmake 2023-01-11T21:11:22.7877178Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/flags.make 2023-01-11T21:11:22.7877756Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/progress.make 2023-01-11T21:11:22.7894803Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o.d 2023-01-11T21:11:22.7961789Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o 2023-01-11T21:11:22.7963201Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeDirectoryInformation.cmake 2023-01-11T21:11:22.7964467Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/TargetDirectories.txt 2023-01-11T21:11:22.7965657Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/progress.marks 2023-01-11T21:11:22.7966826Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile2 2023-01-11T21:11:22.7967379Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile.cmake 2023-01-11T21:11:22.7967917Z inflating: build/custom_test_artifacts/jit-hook-build/detect_cuda_version.cc 2023-01-11T21:11:22.7968952Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeCache.txt 2023-01-11T21:11:22.7970004Z inflating: build/custom_test_artifacts/jit-hook-build/Makefile 2023-01-11T21:11:22.7971051Z inflating: build/custom_test_artifacts/jit-hook-build/cmake_install.cmake 2023-01-11T21:11:22.8021931Z inflating: build/custom_test_artifacts/jit-hook-build/test_jit_hooks 2023-01-11T21:11:22.8023006Z creating: build/custom_test_artifacts/custom-backend-build/ 2023-01-11T21:11:22.8023545Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/ 2023-01-11T21:11:22.8029546Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeOutput.log 2023-01-11T21:11:22.8030797Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/ 2023-01-11T21:11:22.8031596Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeSystem.cmake 2023-01-11T21:11:22.8032176Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdC/ 2023-01-11T21:11:22.8032877Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdC/tmp/ 2023-01-11T21:11:22.8034187Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdC/CMakeCCompilerId.c 2023-01-11T21:11:22.8035596Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdC/a.out 2023-01-11T21:11:22.8036558Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCXX/ 2023-01-11T21:11:22.8037139Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCXX/tmp/ 2023-01-11T21:11:22.8038551Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCXX/CMakeCXXCompilerId.cpp 2023-01-11T21:11:22.8040340Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCXX/a.out 2023-01-11T21:11:22.8041762Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_C.bin 2023-01-11T21:11:22.8042828Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeCCompiler.cmake 2023-01-11T21:11:22.8044173Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CXX.bin 2023-01-11T21:11:22.8045652Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeCXXCompiler.cmake 2023-01-11T21:11:22.8046298Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/ 2023-01-11T21:11:22.8046880Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/ 2023-01-11T21:11:22.8100571Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2023-01-11T21:11:22.8102241Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2023-01-11T21:11:22.8103915Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2023-01-11T21:11:22.8105625Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2023-01-11T21:11:22.8106837Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2023-01-11T21:11:22.8107547Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2023-01-11T21:11:22.8108404Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2023-01-11T21:11:22.8109135Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2023-01-11T21:11:22.8109838Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2023-01-11T21:11:22.8148488Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2023-01-11T21:11:22.8189634Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2023-01-11T21:11:22.8191258Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2023-01-11T21:11:22.8192789Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2023-01-11T21:11:22.8194259Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.reg.c 2023-01-11T21:11:22.8195335Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin 2023-01-11T21:11:22.8196755Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2023-01-11T21:11:22.8197479Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.o 2023-01-11T21:11:22.8198117Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/CMakeCUDACompilerId.cu 2023-01-11T21:11:22.8269705Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/a.out 2023-01-11T21:11:22.8342874Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CUDA.bin 2023-01-11T21:11:22.8344309Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeCUDACompiler.cmake 2023-01-11T21:11:22.8345571Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeTmp/ 2023-01-11T21:11:22.8346779Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeError.log 2023-01-11T21:11:22.8348055Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/cmake.check_cache 2023-01-11T21:11:22.8349341Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/ 2023-01-11T21:11:22.8350680Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.ts 2023-01-11T21:11:22.8352130Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.make 2023-01-11T21:11:22.8352780Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/depend.make 2023-01-11T21:11:22.8353392Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.txt 2023-01-11T21:11:22.8354009Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/cmake_clean.cmake 2023-01-11T21:11:22.8354626Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/build.make 2023-01-11T21:11:22.8355251Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/DependInfo.cmake 2023-01-11T21:11:22.8355866Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/flags.make 2023-01-11T21:11:22.8356908Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/progress.make 2023-01-11T21:11:22.8358073Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o.d 2023-01-11T21:11:22.8511393Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o 2023-01-11T21:11:22.8512804Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/ 2023-01-11T21:11:22.8514247Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.ts 2023-01-11T21:11:22.8515672Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.make 2023-01-11T21:11:22.8517130Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/depend.make 2023-01-11T21:11:22.8518079Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.txt 2023-01-11T21:11:22.8518698Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/cmake_clean.cmake 2023-01-11T21:11:22.8519325Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/build.make 2023-01-11T21:11:22.8519960Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/DependInfo.cmake 2023-01-11T21:11:22.8520589Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/flags.make 2023-01-11T21:11:22.8521199Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/progress.make 2023-01-11T21:11:22.8539144Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o.d 2023-01-11T21:11:22.8601394Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o 2023-01-11T21:11:22.8602901Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeDirectoryInformation.cmake 2023-01-11T21:11:22.8604244Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/TargetDirectories.txt 2023-01-11T21:11:22.8605573Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/progress.marks 2023-01-11T21:11:22.8606343Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile2 2023-01-11T21:11:22.8606895Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile.cmake 2023-01-11T21:11:22.8607494Z inflating: build/custom_test_artifacts/custom-backend-build/detect_cuda_version.cc 2023-01-11T21:11:22.8608611Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeCache.txt 2023-01-11T21:11:22.8609740Z inflating: build/custom_test_artifacts/custom-backend-build/Makefile 2023-01-11T21:11:22.8610716Z inflating: build/custom_test_artifacts/custom-backend-build/cmake_install.cmake 2023-01-11T21:11:22.8733313Z inflating: build/custom_test_artifacts/custom-backend-build/libcustom_backend.so 2023-01-11T21:11:22.8780808Z inflating: build/custom_test_artifacts/custom-backend-build/test_custom_backend 2023-01-11T21:11:22.8781595Z creating: build/lib/ 2023-01-11T21:11:22.8782191Z inflating: build/lib/libclog.a 2023-01-11T21:11:22.8851420Z inflating: build/lib/libgtest.a 2023-01-11T21:11:22.8861878Z inflating: build/lib/libpthreadpool.a 2023-01-11T21:11:22.8967860Z inflating: build/lib/libprotobuf-lite.a 2023-01-11T21:11:22.9064186Z inflating: build/lib/libbenchmark.a 2023-01-11T21:11:22.9073296Z inflating: build/lib/libittnotify.a 2023-01-11T21:11:22.9149902Z inflating: build/lib/libasmjit.a 2023-01-11T21:11:22.9181946Z inflating: build/lib/libtensorpipe_uv.a 2023-01-11T21:11:22.9715069Z inflating: build/lib/libprotobuf.a 2023-01-11T21:11:22.9855636Z inflating: build/lib/libgloo.a 2023-01-11T21:11:22.9888513Z inflating: build/lib/libfmt.a 2023-01-11T21:11:22.9889928Z inflating: build/lib/libcaffe2_nvrtc.so 2023-01-11T21:11:22.9890768Z inflating: build/lib/libfoxi_loader.a 2023-01-11T21:11:22.9973821Z inflating: build/lib/libc10.so 2023-01-11T21:11:22.9975036Z inflating: build/lib/libtorch_global_deps.so 2023-01-11T21:11:22.9985247Z inflating: build/lib/libcpuinfo.a 2023-01-11T21:11:22.9994209Z inflating: build/lib/libcpuinfo_internals.a 2023-01-11T21:11:23.0563420Z inflating: build/lib/libprotoc.a 2023-01-11T21:11:23.0579446Z inflating: build/lib/libqnnpack.a 2023-01-11T21:11:23.0602309Z inflating: build/lib/libpytorch_qnnpack.a 2023-01-11T21:11:23.0604918Z inflating: build/lib/libnnpack_reference_layers.a 2023-01-11T21:11:23.0627112Z inflating: build/lib/libnnpack.a 2023-01-11T21:11:23.0644738Z inflating: build/lib/libgmock.a 2023-01-11T21:11:23.0645519Z inflating: build/lib/libgtest_main.a 2023-01-11T21:11:23.0646391Z inflating: build/lib/libbenchmark_main.a 2023-01-11T21:11:23.0788491Z inflating: build/lib/libXNNPACK.a 2023-01-11T21:11:24.0551392Z inflating: build/lib/libdnnl.a 2023-01-11T21:11:24.1205973Z inflating: build/lib/libtensorpipe.a 2023-01-11T21:11:24.1260643Z inflating: build/lib/libc10_cuda.so 2023-01-11T21:11:24.2800720Z inflating: build/lib/libfbgemm.a 2023-01-11T21:11:24.2801361Z inflating: build/lib/libgmock_main.a 2023-01-11T21:11:24.3956381Z inflating: build/lib/libdnnl_graph.a 2023-01-11T21:11:24.4471976Z inflating: build/lib/libkineto.a 2023-01-11T21:11:24.4761360Z inflating: build/lib/libtensorpipe_cuda.a 2023-01-11T21:11:24.4806541Z inflating: build/lib/libcaffe2_protos.a 2023-01-11T21:11:24.4854899Z inflating: build/lib/libonnx_proto.a 2023-01-11T21:11:24.5533307Z inflating: build/lib/libonnx.a 2023-01-11T21:11:24.5964872Z inflating: build/lib/libgloo_cuda.a 2023-01-11T21:11:26.9761476Z inflating: build/lib/libtorch_cpu.so 2023-01-11T21:11:26.9771751Z inflating: build/lib/libunbox_lib.a 2023-01-11T21:11:29.1154246Z inflating: build/lib/libtorch_cuda.so 2023-01-11T21:11:29.1156559Z inflating: build/lib/libtorch.so 2023-01-11T21:11:29.1157602Z inflating: build/lib/libc10d_cuda_test.so 2023-01-11T21:11:30.1072774Z inflating: build/lib/libtorch_cuda_linalg.so 2023-01-11T21:11:30.1096832Z inflating: build/lib/libjitbackend_test.so 2023-01-11T21:11:30.1128077Z inflating: build/lib/libbackend_with_compiler.so 2023-01-11T21:11:30.1189368Z inflating: build/lib/libtorchbind_test.so 2023-01-11T21:11:30.1194189Z inflating: build/lib/libshm.so 2023-01-11T21:11:30.3048137Z inflating: build/lib/libtorch_python.so 2023-01-11T21:11:30.3088128Z inflating: build/lib/libnnapi_backend.so 2023-01-11T21:11:30.3088780Z creating: build/bin/ 2023-01-11T21:11:30.3143132Z inflating: build/bin/c10_CompileTimeFunctionPointer_test 2023-01-11T21:11:30.3200692Z inflating: build/bin/c10_DeviceGuard_test 2023-01-11T21:11:30.3256736Z inflating: build/bin/c10_Device_test 2023-01-11T21:11:30.3321426Z inflating: build/bin/c10_DispatchKeySet_test 2023-01-11T21:11:30.3374608Z inflating: build/bin/c10_StreamGuard_test 2023-01-11T21:11:30.3429467Z inflating: build/bin/c10_SymInt_test 2023-01-11T21:11:30.3491046Z inflating: build/bin/c10_InlineDeviceGuard_test 2023-01-11T21:11:30.3553021Z inflating: build/bin/c10_InlineStreamGuard_test 2023-01-11T21:11:30.3616004Z inflating: build/bin/c10_SizesAndStrides_test 2023-01-11T21:11:30.3669266Z inflating: build/bin/c10_Array_test 2023-01-11T21:11:30.3728197Z inflating: build/bin/c10_Bitset_test 2023-01-11T21:11:30.3785270Z inflating: build/bin/c10_C++17_test 2023-01-11T21:11:30.3838540Z inflating: build/bin/c10_ConstexprCrc_test 2023-01-11T21:11:30.3892926Z inflating: build/bin/c10_DeadlockDetection_test 2023-01-11T21:11:30.3948008Z inflating: build/bin/c10_Half_test 2023-01-11T21:11:30.4010890Z inflating: build/bin/c10_LeftRight_test 2023-01-11T21:11:30.4079978Z inflating: build/bin/c10_Metaprogramming_test 2023-01-11T21:11:30.4240586Z inflating: build/bin/c10_SmallVectorTest 2023-01-11T21:11:30.4296046Z inflating: build/bin/c10_Synchronized_test 2023-01-11T21:11:30.4360063Z inflating: build/bin/c10_ThreadLocal_test 2023-01-11T21:11:30.4418921Z inflating: build/bin/c10_TypeIndex_test 2023-01-11T21:11:30.4474786Z inflating: build/bin/c10_TypeList_test 2023-01-11T21:11:30.4528075Z inflating: build/bin/c10_TypeTraits_test 2023-01-11T21:11:30.4586254Z inflating: build/bin/c10_accumulate_test 2023-01-11T21:11:30.4648366Z inflating: build/bin/c10_bfloat16_test 2023-01-11T21:11:30.4709943Z inflating: build/bin/c10_complex_math_test 2023-01-11T21:11:30.4770820Z inflating: build/bin/c10_complex_test 2023-01-11T21:11:30.4890691Z inflating: build/bin/c10_either_test 2023-01-11T21:11:30.4949353Z inflating: build/bin/c10_exception_test 2023-01-11T21:11:30.5004729Z inflating: build/bin/c10_flags_test 2023-01-11T21:11:30.5189355Z inflating: build/bin/c10_intrusive_ptr_test 2023-01-11T21:11:30.5245336Z inflating: build/bin/c10_irange_test 2023-01-11T21:11:30.5308777Z inflating: build/bin/c10_logging_test 2023-01-11T21:11:30.5390821Z inflating: build/bin/c10_optional_test 2023-01-11T21:11:30.5459114Z inflating: build/bin/c10_ordered_preserving_dict_test 2023-01-11T21:11:30.5520074Z inflating: build/bin/c10_registry_test 2023-01-11T21:11:30.5584423Z inflating: build/bin/c10_string_view_test 2023-01-11T21:11:30.5641499Z inflating: build/bin/c10_tempfile_test 2023-01-11T21:11:30.5703572Z inflating: build/bin/c10_typeid_test 2023-01-11T21:11:30.5764519Z inflating: build/bin/c10_intrusive_ptr_benchmark 2023-01-11T21:11:30.6287106Z inflating: build/bin/protoc-3.13.0.0 2023-01-11T21:11:30.6808064Z inflating: build/bin/protoc 2023-01-11T21:11:30.6867044Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_1_var_test 2023-01-11T21:11:30.6926117Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_stream 2023-01-11T21:11:30.6984890Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_thread_and_block_and_device 2023-01-11T21:11:30.7042624Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_from_2_processes 2023-01-11T21:11:30.7101861Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_blocks_and_threads 2023-01-11T21:11:30.7155217Z inflating: build/bin/c10_cuda_CUDATest 2023-01-11T21:11:30.7214324Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_multiple_blocks 2023-01-11T21:11:30.7273215Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_same_block 2023-01-11T21:11:30.7597178Z inflating: build/bin/vec_test_all_types_DEFAULT 2023-01-11T21:11:30.7958030Z inflating: build/bin/vec_test_all_types_AVX2 2023-01-11T21:11:30.8017890Z inflating: build/bin/FileStoreTest 2023-01-11T21:11:30.8083787Z inflating: build/bin/TCPStoreTest 2023-01-11T21:11:30.8143823Z inflating: build/bin/HashStoreTest 2023-01-11T21:11:30.8159991Z inflating: build/bin/ProcessGroupMPITest 2023-01-11T21:11:30.8223311Z inflating: build/bin/test_edge_op_registration 2023-01-11T21:11:30.8226553Z inflating: build/bin/example_allreduce 2023-01-11T21:11:30.8285127Z inflating: build/bin/Dimname_test 2023-01-11T21:11:30.8365785Z inflating: build/bin/Dict_test 2023-01-11T21:11:30.8436457Z inflating: build/bin/MaybeOwned_test 2023-01-11T21:11:30.8499747Z inflating: build/bin/NamedTensor_test 2023-01-11T21:11:30.8565216Z inflating: build/bin/apply_utils_test 2023-01-11T21:11:30.8630283Z inflating: build/bin/atest 2023-01-11T21:11:30.8697265Z inflating: build/bin/basic 2023-01-11T21:11:30.8757070Z inflating: build/bin/broadcast_test 2023-01-11T21:11:30.8821353Z inflating: build/bin/cpu_generator_test 2023-01-11T21:11:30.8876544Z inflating: build/bin/dispatch_key_set_test 2023-01-11T21:11:30.8934742Z inflating: build/bin/cpu_profiling_allocator_test 2023-01-11T21:11:30.8989564Z inflating: build/bin/dlconvertor_test 2023-01-11T21:11:30.9053915Z inflating: build/bin/extension_backend_test 2023-01-11T21:11:30.9115404Z inflating: build/bin/half_test 2023-01-11T21:11:30.9170069Z inflating: build/bin/lazy_tensor_test 2023-01-11T21:11:30.9230002Z inflating: build/bin/math_kernel_test 2023-01-11T21:11:30.9289559Z inflating: build/bin/memory_format_test 2023-01-11T21:11:30.9349199Z inflating: build/bin/memory_overlapping_test 2023-01-11T21:11:30.9405431Z inflating: build/bin/operator_name_test 2023-01-11T21:11:30.9467145Z inflating: build/bin/native_test 2023-01-11T21:11:30.9522538Z inflating: build/bin/operators_test 2023-01-11T21:11:30.9581275Z inflating: build/bin/packedtensoraccessor_test 2023-01-11T21:11:30.9644878Z inflating: build/bin/quantized_test 2023-01-11T21:11:30.9701629Z inflating: build/bin/reportMemoryUsage_test 2023-01-11T21:11:30.9763874Z inflating: build/bin/scalar_tensor_test 2023-01-11T21:11:30.9827249Z inflating: build/bin/scalar_test 2023-01-11T21:11:30.9885448Z inflating: build/bin/stride_properties_test 2023-01-11T21:11:30.9972697Z inflating: build/bin/tensor_iterator_test 2023-01-11T21:11:31.0028686Z inflating: build/bin/reduce_ops_test 2023-01-11T21:11:31.0101405Z inflating: build/bin/pow_test 2023-01-11T21:11:31.0104747Z inflating: build/bin/thread_init_test 2023-01-11T21:11:31.0166630Z inflating: build/bin/test_parallel 2023-01-11T21:11:31.0227680Z inflating: build/bin/type_ptr_test 2023-01-11T21:11:31.0282397Z inflating: build/bin/variant_test 2023-01-11T21:11:31.0341222Z inflating: build/bin/undefined_tensor_test 2023-01-11T21:11:31.0399253Z inflating: build/bin/mobile_memory_cleanup 2023-01-11T21:11:31.0400387Z inflating: build/bin/verify_api_visibility 2023-01-11T21:11:31.0476844Z inflating: build/bin/legacy_vmap_test 2023-01-11T21:11:31.0533567Z inflating: build/bin/weakref_test 2023-01-11T21:11:31.0590154Z inflating: build/bin/wrapdim_test 2023-01-11T21:11:31.0645162Z inflating: build/bin/xla_tensor_test 2023-01-11T21:11:31.0711650Z inflating: build/bin/IListRef_test 2023-01-11T21:11:31.0831489Z inflating: build/bin/List_test 2023-01-11T21:11:31.0903481Z inflating: build/bin/KernelFunction_test 2023-01-11T21:11:31.1037211Z inflating: build/bin/kernel_function_legacy_test 2023-01-11T21:11:31.1142747Z inflating: build/bin/kernel_function_test 2023-01-11T21:11:31.1239473Z inflating: build/bin/cpu_rng_test 2023-01-11T21:11:31.1380272Z inflating: build/bin/kernel_lambda_legacy_test 2023-01-11T21:11:31.1483833Z inflating: build/bin/ivalue_test 2023-01-11T21:11:31.1598742Z inflating: build/bin/kernel_lambda_test 2023-01-11T21:11:31.1665533Z inflating: build/bin/kernel_stackbased_test 2023-01-11T21:11:31.1722081Z inflating: build/bin/CppSignature_test 2023-01-11T21:11:31.1827166Z inflating: build/bin/make_boxed_from_unboxed_functor_test 2023-01-11T21:11:31.1880129Z inflating: build/bin/op_allowlist_test 2023-01-11T21:11:31.1948507Z inflating: build/bin/type_test 2023-01-11T21:11:31.2010845Z inflating: build/bin/backend_fallback_test 2023-01-11T21:11:31.2325720Z inflating: build/bin/op_registration_test 2023-01-11T21:11:31.2385660Z inflating: build/bin/inline_container_test 2023-01-11T21:11:31.2443861Z inflating: build/bin/cuda_apply_test 2023-01-11T21:11:31.2510241Z inflating: build/bin/cuda_atomic_ops_test 2023-01-11T21:11:31.2569823Z inflating: build/bin/cuda_caching_host_allocator_test 2023-01-11T21:11:31.2647946Z inflating: build/bin/cuda_complex_math_test 2023-01-11T21:11:31.2712522Z inflating: build/bin/cuda_complex_test 2023-01-11T21:11:31.2766795Z inflating: build/bin/cuda_device_test 2023-01-11T21:11:31.2832282Z inflating: build/bin/cuda_cub_test 2023-01-11T21:11:31.2887585Z inflating: build/bin/cuda_dlconvertor_test 2023-01-11T21:11:31.2961713Z inflating: build/bin/cuda_distributions_test 2023-01-11T21:11:31.3017551Z inflating: build/bin/cuda_integer_divider_test 2023-01-11T21:11:31.3082333Z inflating: build/bin/cuda_generator_test 2023-01-11T21:11:31.3136525Z inflating: build/bin/cuda_half_test 2023-01-11T21:11:31.3203840Z inflating: build/bin/cuda_stream_test 2023-01-11T21:11:31.3262556Z inflating: build/bin/cuda_reportMemoryUsage_test 2023-01-11T21:11:31.3316611Z inflating: build/bin/cuda_optional_test 2023-01-11T21:11:31.3373262Z inflating: build/bin/cuda_packedtensoraccessor_test 2023-01-11T21:11:31.3427181Z inflating: build/bin/cuda_cudnn_test 2023-01-11T21:11:31.3485068Z inflating: build/bin/cuda_vectorized_test 2023-01-11T21:11:31.3503251Z inflating: build/bin/tutorial_tensorexpr 2023-01-11T21:11:31.3575109Z inflating: build/bin/ProcessGroupGlooTest 2023-01-11T21:11:31.3639738Z inflating: build/bin/ProcessGroupGlooAsyncTest 2023-01-11T21:11:31.3707761Z inflating: build/bin/ProcessGroupNCCLTest 2023-01-11T21:11:31.3771907Z inflating: build/bin/ProcessGroupNCCLErrorsTest 2023-01-11T21:11:31.3830832Z inflating: build/bin/ProcessGroupUCCTest 2023-01-11T21:11:31.3890488Z inflating: build/bin/test_dist_autograd 2023-01-11T21:11:31.3968702Z inflating: build/bin/test_cpp_rpc 2023-01-11T21:11:31.3971386Z inflating: build/bin/parallel_benchmark 2023-01-11T21:11:31.4046958Z inflating: build/bin/test_mobile_nnc 2023-01-11T21:11:31.4058720Z inflating: build/bin/aot_model_compiler_test 2023-01-11T21:11:31.4967354Z inflating: build/bin/test_tensorexpr 2023-01-11T21:11:31.4973343Z inflating: build/bin/torch_shm_manager 2023-01-11T21:11:31.5362065Z inflating: build/bin/test_lazy 2023-01-11T21:11:31.6681534Z inflating: build/bin/test_api 2023-01-11T21:11:31.7883617Z inflating: build/bin/test_jit 2023-01-11T21:11:31.7885580Z inflating: .pytorch-test-times.json 2023-01-11T21:11:31.7916545Z ##[group]Run df -H 2023-01-11T21:11:31.7916781Z df -H 2023-01-11T21:11:31.7930121Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T21:11:31.7930418Z env: 2023-01-11T21:11:31.7930662Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:11:31.7930912Z GPU_FLAG: --gpus all 2023-01-11T21:11:31.7931159Z ##[endgroup] 2023-01-11T21:11:31.7970334Z Filesystem Size Used Avail Use% Mounted on 2023-01-11T21:11:31.7970662Z devtmpfs 129G 0 129G 0% /dev 2023-01-11T21:11:31.7970953Z tmpfs 129G 0 129G 0% /dev/shm 2023-01-11T21:11:31.7971216Z tmpfs 129G 607k 129G 1% /run 2023-01-11T21:11:31.7971509Z tmpfs 129G 0 129G 0% /sys/fs/cgroup 2023-01-11T21:11:31.7972017Z /dev/xvda1 162G 29G 133G 18% / 2023-01-11T21:11:31.7972318Z tmpfs 26G 0 26G 0% /run/user/0 2023-01-11T21:11:31.7998231Z ##[group]Run .github/scripts/parse_ref.py 2023-01-11T21:11:31.7998607Z .github/scripts/parse_ref.py 2023-01-11T21:11:31.8010942Z shell: /usr/bin/bash -e {0} 2023-01-11T21:11:31.8011201Z env: 2023-01-11T21:11:31.8011424Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:11:31.8011691Z GPU_FLAG: --gpus all 2023-01-11T21:11:31.8011941Z ##[endgroup] 2023-01-11T21:11:31.8302888Z ##[group]Run set -x 2023-01-11T21:11:31.8303273Z set -x 2023-01-11T21:11:31.8303505Z  2023-01-11T21:11:31.8303760Z if [[ $TEST_CONFIG == 'multigpu' ]]; then 2023-01-11T21:11:31.8304110Z  TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh 2023-01-11T21:11:31.8304459Z elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then 2023-01-11T21:11:31.8304783Z  TEST_COMMAND=.jenkins/onnx/test.sh 2023-01-11T21:11:31.8305035Z else 2023-01-11T21:11:31.8305316Z  TEST_COMMAND=.jenkins/pytorch/test.sh 2023-01-11T21:11:31.8305588Z fi 2023-01-11T21:11:31.8305791Z  2023-01-11T21:11:31.8306108Z COMMIT_MESSAGES=$(git cherry -v "origin/${GIT_DEFAULT_BRANCH:-master}") 2023-01-11T21:11:31.8306433Z  2023-01-11T21:11:31.8306711Z # sanitize the input commit message and PR body here: 2023-01-11T21:11:31.8307003Z # 2023-01-11T21:11:31.8307384Z # trim all new lines from commit messages + PR_BODY to avoid issues with batch environment 2023-01-11T21:11:31.8307871Z # variable copying. see https://github.com/pytorch/pytorch/pull/80043#issuecomment-1167796028 2023-01-11T21:11:31.8308295Z COMMIT_MESSAGES="${COMMIT_MESSAGES//[$'\n\r']}" 2023-01-11T21:11:31.8308608Z PR_BODY="${PR_BODY//[$'\n\r']}" 2023-01-11T21:11:31.8308865Z  2023-01-11T21:11:31.8309198Z # then trim all special characters like single and double quotes to avoid unescaped inputs to 2023-01-11T21:11:31.8309570Z # wreak havoc internally 2023-01-11T21:11:31.8309889Z export COMMIT_MESSAGES="${COMMIT_MESSAGES//[\'\"]}" 2023-01-11T21:11:31.8310199Z export PR_BODY="${PR_BODY//[\'\"]}" 2023-01-11T21:11:31.8310461Z  2023-01-11T21:11:31.8310769Z # detached container should get cleaned up by teardown_ec2_linux 2023-01-11T21:11:31.8311151Z # TODO: Stop building test binaries as part of the build phase 2023-01-11T21:11:31.8311523Z # Used for GPU_FLAG since that doesn't play nice 2023-01-11T21:11:31.8311848Z # shellcheck disable=SC2086,SC2090 2023-01-11T21:11:31.8312148Z container_name=$(docker run \ 2023-01-11T21:11:31.8312406Z  ${GPU_FLAG:-} \ 2023-01-11T21:11:31.8312680Z  -e BUILD_ENVIRONMENT \ 2023-01-11T21:11:31.8312949Z  -e PR_NUMBER \ 2023-01-11T21:11:31.8313196Z  -e GITHUB_ACTIONS \ 2023-01-11T21:11:31.8313451Z  -e BASE_SHA \ 2023-01-11T21:11:31.8313701Z  -e BRANCH \ 2023-01-11T21:11:31.8313931Z  -e SHA1 \ 2023-01-11T21:11:31.8314189Z  -e AWS_DEFAULT_REGION \ 2023-01-11T21:11:31.8314468Z  -e IN_WHEEL_TEST \ 2023-01-11T21:11:31.8314840Z  -e SHARD_NUMBER \ 2023-01-11T21:11:31.8315103Z  -e TEST_CONFIG \ 2023-01-11T21:11:31.8315372Z  -e NUM_TEST_SHARDS \ 2023-01-11T21:11:31.8315634Z  -e PR_BODY \ 2023-01-11T21:11:31.8315882Z  -e COMMIT_MESSAGES \ 2023-01-11T21:11:31.8316166Z  -e CONTINUE_THROUGH_ERROR \ 2023-01-11T21:11:31.8316466Z  -e PYTORCH_RETRY_TEST_CASES \ 2023-01-11T21:11:31.8316764Z  -e PYTORCH_OVERRIDE_FLAKY_SIGNAL \ 2023-01-11T21:11:31.8317049Z  -e PR_LABELS \ 2023-01-11T21:11:31.8317341Z  -e MAX_JOBS="$(nproc --ignore=2)" \ 2023-01-11T21:11:31.8317613Z  -e SCCACHE_BUCKET \ 2023-01-11T21:11:31.8317893Z  -e SCCACHE_S3_KEY_PREFIX \ 2023-01-11T21:11:31.8318163Z  -e XLA_CUDA \ 2023-01-11T21:11:31.8318429Z  -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ 2023-01-11T21:11:31.8318748Z  -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK \ 2023-01-11T21:11:31.8319074Z  -e PYTORCH_TEST_RERUN_DISABLED_TESTS \ 2023-01-11T21:11:31.8319428Z  --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ 2023-01-11T21:11:31.8319734Z  --ulimit stack=10485760:83886080 \ 2023-01-11T21:11:31.8320109Z  --security-opt seccomp=unconfined \ 2023-01-11T21:11:31.8320427Z  --cap-add=SYS_PTRACE \ 2023-01-11T21:11:31.8320678Z  --ipc=host \ 2023-01-11T21:11:31.8320949Z  --shm-size="${SHM_SIZE}" \ 2023-01-11T21:11:31.8321208Z  --tty \ 2023-01-11T21:11:31.8321433Z  --detach \ 2023-01-11T21:11:31.8321702Z  --name="${container_name}" \ 2023-01-11T21:11:31.8321978Z  --user jenkins \ 2023-01-11T21:11:31.8322278Z  -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ 2023-01-11T21:11:31.8322623Z  -w /var/lib/jenkins/workspace \ 2023-01-11T21:11:31.8322903Z  "${DOCKER_IMAGE}" 2023-01-11T21:11:31.8323148Z ) 2023-01-11T21:11:31.8323429Z echo "DOCKER_CONTAINER_ID=${container_name}" >> "${GITHUB_ENV}" 2023-01-11T21:11:31.8323876Z docker exec -t "${container_name}" sh -c "pip install $(echo dist/*.whl)[opt-einsum] && ${TEST_COMMAND}" 2023-01-11T21:11:31.8335352Z shell: /usr/bin/bash -e {0} 2023-01-11T21:11:31.8335588Z env: 2023-01-11T21:11:31.8335829Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:11:31.8336094Z GPU_FLAG: --gpus all 2023-01-11T21:11:31.8336406Z BUILD_ENVIRONMENT: linux-bionic-cuda11.6-py3.10-gcc7 2023-01-11T21:11:31.8337016Z PR_NUMBER: 91627 2023-01-11T21:11:31.8337270Z BRANCH: pull/91627 2023-01-11T21:11:31.8337551Z SHA1: 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:11:31.8337886Z BASE_SHA: db2a237763eb8693a20788be94f8c192e762baa8 2023-01-11T21:11:31.8338187Z PYTORCH_RETRY_TEST_CASES: 1 2023-01-11T21:11:31.8338455Z PYTORCH_OVERRIDE_FLAKY_SIGNAL: 1 2023-01-11T21:11:31.8338739Z TEST_CONFIG: distributed 2023-01-11T21:11:31.8338990Z SHARD_NUMBER: 3 2023-01-11T21:11:31.8339232Z NUM_TEST_SHARDS: 3 2023-01-11T21:11:31.8339502Z PR_BODY: Fixes #91003 cc @ezyang @gchanan 2023-01-11T21:11:31.8339796Z CONTINUE_THROUGH_ERROR: False 2023-01-11T21:11:31.8340132Z SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 2023-01-11T21:11:31.8340447Z SCCACHE_S3_KEY_PREFIX: pull 2023-01-11T21:11:31.8340706Z SHM_SIZE: 2g 2023-01-11T21:11:31.8341195Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd 2023-01-11T21:11:31.8341651Z XLA_CUDA: 2023-01-11T21:11:31.8342001Z XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla 2023-01-11T21:11:31.8342379Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK: 0 2023-01-11T21:11:31.8342680Z PYTORCH_TEST_RERUN_DISABLED_TESTS: 0 2023-01-11T21:11:31.8342936Z ##[endgroup] 2023-01-11T21:11:31.8371378Z + [[ distributed == \m\u\l\t\i\g\p\u ]] 2023-01-11T21:11:31.8371890Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 == *onnx* ]] 2023-01-11T21:11:31.8372231Z + TEST_COMMAND=.jenkins/pytorch/test.sh 2023-01-11T21:11:31.8375706Z ++ git cherry -v origin/master 2023-01-11T21:11:31.8902616Z + COMMIT_MESSAGES='+ 52a16ce42647731c772e14e7175afa40fda07b3d make torchgen rename also Number arguments into '\''input'\'' 2023-01-11T21:11:31.8903106Z + 87db01a53ecb702267ec36787654e418a52f8e93 fix torch.where signature mismatch 2023-01-11T21:11:31.8903679Z + 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e '\''other'\'' instead of '\''output'\'' in documentation' 2023-01-11T21:11:31.8904978Z + COMMIT_MESSAGES='+ 52a16ce42647731c772e14e7175afa40fda07b3d make torchgen rename also Number arguments into '\''input'\''+ 87db01a53ecb702267ec36787654e418a52f8e93 fix torch.where signature mismatch+ 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e '\''other'\'' instead of '\''output'\'' in documentation' 2023-01-11T21:11:31.8905678Z + PR_BODY='Fixes #91003 cc @ezyang @gchanan' 2023-01-11T21:11:31.8907765Z + export 'COMMIT_MESSAGES=+ 52a16ce42647731c772e14e7175afa40fda07b3d make torchgen rename also Number arguments into input+ 87db01a53ecb702267ec36787654e418a52f8e93 fix torch.where signature mismatch+ 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e other instead of output in documentation' 2023-01-11T21:11:31.8909045Z + COMMIT_MESSAGES='+ 52a16ce42647731c772e14e7175afa40fda07b3d make torchgen rename also Number arguments into input+ 87db01a53ecb702267ec36787654e418a52f8e93 fix torch.where signature mismatch+ 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e other instead of output in documentation' 2023-01-11T21:11:31.8909726Z + export 'PR_BODY=Fixes #91003 cc @ezyang @gchanan' 2023-01-11T21:11:31.8910086Z + PR_BODY='Fixes #91003 cc @ezyang @gchanan' 2023-01-11T21:11:31.8917759Z +++ nproc --ignore=2 2023-01-11T21:11:31.8949733Z ++ docker run --gpus all -e BUILD_ENVIRONMENT -e PR_NUMBER -e GITHUB_ACTIONS -e BASE_SHA -e BRANCH -e SHA1 -e AWS_DEFAULT_REGION -e IN_WHEEL_TEST -e SHARD_NUMBER -e TEST_CONFIG -e NUM_TEST_SHARDS -e PR_BODY -e COMMIT_MESSAGES -e CONTINUE_THROUGH_ERROR -e PYTORCH_RETRY_TEST_CASES -e PYTORCH_OVERRIDE_FLAKY_SIGNAL -e PR_LABELS -e MAX_JOBS=30 -e SCCACHE_BUCKET -e SCCACHE_S3_KEY_PREFIX -e XLA_CUDA -e XLA_CLANG_CACHE_S3_BUCKET_NAME -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK -e PYTORCH_TEST_RERUN_DISABLED_TESTS --env-file=/tmp/github_env_3896099317 --ulimit stack=10485760:83886080 --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --ipc=host --shm-size=2g --tty --detach --name= --user jenkins -v /home/ec2-user/actions-runner/_work/pytorch/pytorch:/var/lib/jenkins/workspace -w /var/lib/jenkins/workspace 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd 2023-01-11T21:11:46.1955787Z + container_name=7c5487d9c02b51c7d3fc373594ecc7b146547c11845b5a7279d2f69d27bfcc5b 2023-01-11T21:11:46.1956298Z + echo DOCKER_CONTAINER_ID=7c5487d9c02b51c7d3fc373594ecc7b146547c11845b5a7279d2f69d27bfcc5b 2023-01-11T21:11:46.1962789Z ++ echo dist/torch-2.0.0a0+git8419ddd-cp310-cp310-linux_x86_64.whl 2023-01-11T21:11:46.1964295Z + docker exec -t 7c5487d9c02b51c7d3fc373594ecc7b146547c11845b5a7279d2f69d27bfcc5b sh -c 'pip install dist/torch-2.0.0a0+git8419ddd-cp310-cp310-linux_x86_64.whl[opt-einsum] && .jenkins/pytorch/test.sh' 2023-01-11T21:11:46.7509072Z Processing ./dist/torch-2.0.0a0+git8419ddd-cp310-cp310-linux_x86_64.whl 2023-01-11T21:11:47.7240622Z Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.10/site-packages (from torch==2.0.0a0+git8419ddd) (4.4.0) 2023-01-11T21:11:47.7244129Z Requirement already satisfied: sympy in /opt/conda/lib/python3.10/site-packages (from torch==2.0.0a0+git8419ddd) (1.11.1) 2023-01-11T21:11:47.7249158Z Requirement already satisfied: networkx in /opt/conda/lib/python3.10/site-packages (from torch==2.0.0a0+git8419ddd) (2.6.3) 2023-01-11T21:11:47.7266665Z Requirement already satisfied: opt-einsum>=3.3 in /opt/conda/lib/python3.10/site-packages (from torch==2.0.0a0+git8419ddd) (3.3.0) 2023-01-11T21:11:47.7349036Z Requirement already satisfied: numpy>=1.7 in /opt/conda/lib/python3.10/site-packages (from opt-einsum>=3.3->torch==2.0.0a0+git8419ddd) (1.21.2) 2023-01-11T21:11:47.7567092Z Requirement already satisfied: mpmath>=0.19 in /opt/conda/lib/python3.10/site-packages (from sympy->torch==2.0.0a0+git8419ddd) (1.2.1) 2023-01-11T21:11:48.7100300Z Installing collected packages: torch 2023-01-11T21:11:58.3169507Z Successfully installed torch-2.0.0a0+git8419ddd 2023-01-11T21:11:58.4715649Z + echo 'Environment variables:' 2023-01-11T21:11:58.4716016Z Environment variables: 2023-01-11T21:11:58.4718694Z + env 2023-01-11T21:11:58.4722838Z SHARD_NUMBER=3 2023-01-11T21:11:58.4723445Z NV_LIBCUBLAS_DEV_VERSION=11.9.2.110-1 2023-01-11T21:11:58.4723898Z NV_CUDA_COMPAT_PACKAGE=cuda-compat-11-6 2023-01-11T21:11:58.4725644Z LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64 2023-01-11T21:11:58.4726168Z NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.12.10-1+cuda11.6 2023-01-11T21:11:58.4726477Z UCC_HOME=/usr 2023-01-11T21:11:58.4726876Z BUILD_ENVIRONMENT=linux-bionic-cuda11.6-py3.10-gcc7 2023-01-11T21:11:58.4727206Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=0 2023-01-11T21:11:58.4727620Z NV_LIBNPP_DEV_PACKAGE=libnpp-dev-11-6=11.6.3.124-1 2023-01-11T21:11:58.4727916Z INSTALLED_DB=yes 2023-01-11T21:11:58.4728150Z HOSTNAME=7c5487d9c02b 2023-01-11T21:11:58.4731336Z GITHUB_REF_NAME=91627/merge 2023-01-11T21:11:58.4731760Z GITHUB_API_URL=https://api.github.com 2023-01-11T21:11:58.4732100Z GITHUB_REPOSITORY_OWNER_ID=21003710 2023-01-11T21:11:58.4732389Z OPENSSL_DIR=/opt/openssl 2023-01-11T21:11:58.4732680Z UCC_COMMIT=1c7a7127186e7836f73aafbd7697bbc274a77eee 2023-01-11T21:11:58.4733328Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_b67d5d81-a6e3-4068-8079-7402625cf872 2023-01-11T21:11:58.4733742Z CUDA_PATH=/usr/local/cuda 2023-01-11T21:11:58.4734221Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux 2023-01-11T21:11:58.4734603Z GITHUB_RUN_ATTEMPT=2 2023-01-11T21:11:58.4734879Z TEST_CONFIG=distributed 2023-01-11T21:11:58.4735165Z NV_LIBNPP_VERSION=11.6.3.124-1 2023-01-11T21:11:58.4735560Z NV_NVPROF_DEV_PACKAGE=cuda-nvprof-11-6=11.6.124-1 2023-01-11T21:11:58.4735880Z GITHUB_REPOSITORY_OWNER=pytorch 2023-01-11T21:11:58.4736153Z GITHUB_ACTIONS=true 2023-01-11T21:11:58.4736425Z NVIDIA_VISIBLE_DEVICES=all 2023-01-11T21:11:58.4737271Z NV_NVPROF_VERSION=11.6.124-1 2023-01-11T21:11:58.4737596Z NV_LIBCUSPARSE_VERSION=11.7.2.124-1 2023-01-11T21:11:58.4737960Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/pull.yml@refs/pull/91627/merge 2023-01-11T21:11:58.4738317Z NVIDIA_PRODUCT_NAME=CUDA 2023-01-11T21:11:58.4738562Z CI=true 2023-01-11T21:11:58.4738793Z PYTORCH_OVERRIDE_FLAKY_SIGNAL=1 2023-01-11T21:11:58.4739202Z NV_LIBCUBLAS_DEV_PACKAGE=libcublas-dev-11-6=11.9.2.110-1 2023-01-11T21:11:58.4739508Z BRANCH=pull/91627 2023-01-11T21:11:58.4739742Z GITHUB_HEAD_REF=master 2023-01-11T21:11:58.4740058Z UCX_COMMIT=31e74cac7bee0ef66bef2af72e7d86d9c282e5ab 2023-01-11T21:11:58.4740378Z GITHUB_ACTOR=LucaLumetti 2023-01-11T21:11:58.4740670Z CMAKE_CUDA_COMPILER_LAUNCHER=/opt/cache/bin/sccache 2023-01-11T21:11:58.4740977Z GITHUB_ACTION_REF= 2023-01-11T21:11:58.4741254Z NCCL_VERSION=2.12.10-1 2023-01-11T21:11:58.4741513Z GITHUB_ACTION=__self 2023-01-11T21:11:58.4741762Z GITHUB_REF_PROTECTED=false 2023-01-11T21:11:58.4742216Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla 2023-01-11T21:11:58.4742601Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2023-01-11T21:11:58.4743412Z *** 2023-01-11T21:11:58.4743656Z INSTALLED_VISION=yes 2023-01-11T21:11:58.4743903Z NVARCH=x86_64 2023-01-11T21:11:58.4744197Z NV_LIBCUSPARSE_DEV_VERSION=11.7.2.124-1 2023-01-11T21:11:58.4744475Z HOME=/var/lib/jenkins 2023-01-11T21:11:58.4745001Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_b67d5d81-a6e3-4068-8079-7402625cf872 2023-01-11T21:11:58.4745388Z CARGO_NET_GIT_FETCH_WITH_CLI=true 2023-01-11T21:11:58.4745672Z GITHUB_ACTION_REPOSITORY= 2023-01-11T21:11:58.4745935Z GITHUB_REF_TYPE=branch 2023-01-11T21:11:58.4746319Z NV_LIBNCCL_PACKAGE_VERSION=2.12.10-1 2023-01-11T21:11:58.4746760Z GITHUB_RETENTION_DAYS=90 2023-01-11T21:11:58.4747151Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2 2023-01-11T21:11:58.4747562Z NV_LIBNCCL_PACKAGE=libnccl2=2.12.10-1+cuda11.6 2023-01-11T21:11:58.4748091Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_b67d5d81-a6e3-4068-8079-7402625cf872 2023-01-11T21:11:58.4748487Z DEBIAN_FRONTEND=noninteractive 2023-01-11T21:11:58.4748836Z NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev 2023-01-11T21:11:58.4749115Z GITHUB_REF=refs/pull/91627/merge 2023-01-11T21:11:58.4749414Z NV_CUDA_LIB_VERSION=11.6.2-1 2023-01-11T21:11:58.4749717Z GITHUB_SHA=57fc38f02f250896a12b32cfa200a6105a03d09c 2023-01-11T21:11:58.4750016Z INSTALLED_PROTOBUF=yes 2023-01-11T21:11:58.4750273Z GITHUB_REPOSITORY_ID=65600975 2023-01-11T21:11:58.4750538Z GITHUB_RUN_ID=3896099317 2023-01-11T21:11:58.4750884Z NV_LIBNPP_PACKAGE=libnpp-11-6=11.6.3.124-1 2023-01-11T21:11:58.4751168Z NV_LIBNCCL_PACKAGE_NAME=libnccl2 2023-01-11T21:11:58.4751464Z LIBRARY_PATH=/usr/local/cuda/lib64/stubs 2023-01-11T21:11:58.4751779Z NV_NVTX_VERSION=11.6.124-1 2023-01-11T21:11:58.4752035Z CONTINUE_THROUGH_ERROR=False 2023-01-11T21:11:58.4752336Z GITHUB_SERVER_URL=https://github.com 2023-01-11T21:11:58.4752692Z MAX_JOBS=30 2023-01-11T21:11:58.4752932Z GITHUB_ACTOR_ID=7543386 2023-01-11T21:11:58.4753232Z NV_LIBCUBLAS_VERSION=11.9.2.110-1 2023-01-11T21:11:58.4753609Z NV_LIBCUBLAS_PACKAGE=libcublas-11-6=11.9.2.110-1 2023-01-11T21:11:58.4754073Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json 2023-01-11T21:11:58.4754413Z UCX_HOME=/usr 2023-01-11T21:11:58.4754667Z PYTORCH_RETRY_TEST_CASES=1 2023-01-11T21:11:58.4754977Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2023-01-11T21:11:58.4755323Z BASE_SHA=db2a237763eb8693a20788be94f8c192e762baa8 2023-01-11T21:11:58.4755660Z NV_CUDA_CUDART_DEV_VERSION=11.6.55-1 2023-01-11T21:11:58.4755942Z PR_BODY=Fixes #91003 cc @ezyang @gchanan 2023-01-11T21:11:58.4756224Z GITHUB_BASE_REF=master 2023-01-11T21:11:58.4756471Z TERM=xterm 2023-01-11T21:11:58.4756676Z XLA_CUDA= 2023-01-11T21:11:58.4756941Z NV_NVML_DEV_VERSION=11.6.55-1 2023-01-11T21:11:58.4757212Z TORCH_CUDA_ARCH_LIST=Maxwell 2023-01-11T21:11:58.4757453Z CUDA_VERSION=11.6.2 2023-01-11T21:11:58.4757792Z NV_LIBCUBLAS_PACKAGE_NAME=libcublas-11-6 2023-01-11T21:11:58.4758091Z OPENSSL_ROOT_DIR=/opt/openssl 2023-01-11T21:11:58.4758620Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_b67d5d81-a6e3-4068-8079-7402625cf872 2023-01-11T21:11:58.4758977Z GITHUB_JOB=test 2023-01-11T21:11:58.4759229Z SCCACHE_S3_KEY_PREFIX=pull 2023-01-11T21:11:58.4759827Z COMMIT_MESSAGES=+ 52a16ce42647731c772e14e7175afa40fda07b3d make torchgen rename also Number arguments into input+ 87db01a53ecb702267ec36787654e418a52f8e93 fix torch.where signature mismatch+ 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e other instead of output in documentation 2023-01-11T21:11:58.4760488Z NVIDIA_DRIVER_CAPABILITIES=compute,utility 2023-01-11T21:11:58.4760757Z NUM_TEST_SHARDS=3 2023-01-11T21:11:58.4761007Z PR_NUMBER=91627 2023-01-11T21:11:58.4761528Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_b67d5d81-a6e3-4068-8079-7402625cf872 2023-01-11T21:11:58.4761878Z SHLVL=1 2023-01-11T21:11:58.4762221Z NV_LIBCUBLAS_DEV_PACKAGE_NAME=libcublas-dev-11-6 2023-01-11T21:11:58.4762549Z GITHUB_REPOSITORY=pytorch/pytorch 2023-01-11T21:11:58.4763336Z NVIDIA_REQUIRE_CUDA=cuda>=11.6 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=450,driver<451 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 2023-01-11T21:11:58.4764084Z NV_LIBNPP_DEV_VERSION=11.6.3.124-1 2023-01-11T21:11:58.4764398Z SHA1=8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:11:58.4764777Z GITHUB_EVENT_NAME=pull_request 2023-01-11T21:11:58.4765072Z NV_CUDA_CUDART_VERSION=11.6.55-1 2023-01-11T21:11:58.4765419Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all 2023-01-11T21:11:58.4765706Z GITHUB_RUN_NUMBER=77928 2023-01-11T21:11:58.4765968Z GITHUB_WORKFLOW=pull 2023-01-11T21:11:58.4766364Z PATH=/opt/cache/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2023-01-11T21:11:58.4766826Z NV_LIBNCCL_DEV_PACKAGE_VERSION=2.12.10-1 2023-01-11T21:11:58.4767185Z GITHUB_WORKFLOW_SHA=57fc38f02f250896a12b32cfa200a6105a03d09c 2023-01-11T21:11:58.4767655Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch 2023-01-11T21:11:58.4768018Z GITHUB_TRIGGERING_ACTOR=albanD 2023-01-11T21:11:58.4768289Z _=/usr/bin/env 2023-01-11T21:11:58.4768673Z ++ python -c 'import site; print(site.getsitepackages()[0])' 2023-01-11T21:11:58.4949363Z + TORCH_INSTALL_DIR=/opt/conda/lib/python3.10/site-packages/torch 2023-01-11T21:11:58.4949861Z + TORCH_BIN_DIR=/opt/conda/lib/python3.10/site-packages/torch/bin 2023-01-11T21:11:58.4950321Z + TORCH_LIB_DIR=/opt/conda/lib/python3.10/site-packages/torch/lib 2023-01-11T21:11:58.4950898Z + TORCH_TEST_DIR=/opt/conda/lib/python3.10/site-packages/torch/test 2023-01-11T21:11:58.4951237Z + BUILD_DIR=build 2023-01-11T21:11:58.4951519Z + BUILD_RENAMED_DIR=build_renamed 2023-01-11T21:11:58.4951774Z + BUILD_BIN_DIR=build/bin 2023-01-11T21:11:58.4952036Z + export VALGRIND=ON 2023-01-11T21:11:58.4952280Z + VALGRIND=ON 2023-01-11T21:11:58.4952531Z + export TORCH_INDUCTOR_INSTALL_GXX=ON 2023-01-11T21:11:58.4952827Z + TORCH_INDUCTOR_INSTALL_GXX=ON 2023-01-11T21:11:58.4953244Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 == *clang9* ]] 2023-01-11T21:11:58.4953665Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 != *bazel* ]] 2023-01-11T21:11:58.4956356Z ++ realpath build/custom_test_artifacts 2023-01-11T21:11:58.4963989Z + CUSTOM_TEST_ARTIFACT_BUILD_DIR=/var/lib/jenkins/workspace/build/custom_test_artifacts 2023-01-11T21:11:58.4967873Z ++ dirname .jenkins/pytorch/test.sh 2023-01-11T21:11:58.4974947Z + source .jenkins/pytorch/common.sh 2023-01-11T21:11:58.4979564Z +++ dirname .jenkins/pytorch/common.sh 2023-01-11T21:11:58.4989690Z ++ source .jenkins/pytorch/common_utils.sh 2023-01-11T21:11:58.4992231Z +++ declare -f -t trap_add 2023-01-11T21:11:58.4998833Z ++ set -ex 2023-01-11T21:11:58.4999524Z ++ [[ linux-bionic-cuda11.6-py3.10-gcc7 == *rocm* ]] 2023-01-11T21:11:58.4999874Z ++ BUILD_TEST_LIBTORCH=0 2023-01-11T21:11:58.5000482Z + echo 'Environment variables' 2023-01-11T21:11:58.5000769Z Environment variables 2023-01-11T21:11:58.5000990Z + env 2023-01-11T21:11:58.5007749Z SHARD_NUMBER=3 2023-01-11T21:11:58.5008318Z NV_LIBCUBLAS_DEV_VERSION=11.9.2.110-1 2023-01-11T21:11:58.5008680Z NV_CUDA_COMPAT_PACKAGE=cuda-compat-11-6 2023-01-11T21:11:58.5009037Z LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64 2023-01-11T21:11:58.5009737Z NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.12.10-1+cuda11.6 2023-01-11T21:11:58.5010011Z UCC_HOME=/usr 2023-01-11T21:11:58.5010404Z BUILD_ENVIRONMENT=linux-bionic-cuda11.6-py3.10-gcc7 2023-01-11T21:11:58.5011032Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=0 2023-01-11T21:11:58.5011434Z NV_LIBNPP_DEV_PACKAGE=libnpp-dev-11-6=11.6.3.124-1 2023-01-11T21:11:58.5011718Z INSTALLED_DB=yes 2023-01-11T21:11:58.5012104Z HOSTNAME=7c5487d9c02b 2023-01-11T21:11:58.5012526Z GITHUB_REF_NAME=91627/merge 2023-01-11T21:11:58.5012823Z GITHUB_API_URL=https://api.github.com 2023-01-11T21:11:58.5013140Z GITHUB_REPOSITORY_OWNER_ID=21003710 2023-01-11T21:11:58.5013623Z OPENSSL_DIR=/opt/openssl 2023-01-11T21:11:58.5013983Z UCC_COMMIT=1c7a7127186e7836f73aafbd7697bbc274a77eee 2023-01-11T21:11:58.5014590Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_b67d5d81-a6e3-4068-8079-7402625cf872 2023-01-11T21:11:58.5015087Z CUDA_PATH=/usr/local/cuda 2023-01-11T21:11:58.5015753Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux 2023-01-11T21:11:58.5016316Z GITHUB_RUN_ATTEMPT=2 2023-01-11T21:11:58.5016851Z TEST_CONFIG=distributed 2023-01-11T21:11:58.5017180Z NV_LIBNPP_VERSION=11.6.3.124-1 2023-01-11T21:11:58.5017667Z NV_NVPROF_DEV_PACKAGE=cuda-nvprof-11-6=11.6.124-1 2023-01-11T21:11:58.5018148Z GITHUB_REPOSITORY_OWNER=pytorch 2023-01-11T21:11:58.5018425Z GITHUB_ACTIONS=true 2023-01-11T21:11:58.5018667Z NVIDIA_VISIBLE_DEVICES=all 2023-01-11T21:11:58.5018973Z NV_NVPROF_VERSION=11.6.124-1 2023-01-11T21:11:58.5019286Z NV_LIBCUSPARSE_VERSION=11.7.2.124-1 2023-01-11T21:11:58.5019653Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/pull.yml@refs/pull/91627/merge 2023-01-11T21:11:58.5020008Z NVIDIA_PRODUCT_NAME=CUDA 2023-01-11T21:11:58.5020253Z CI=true 2023-01-11T21:11:58.5020486Z PYTORCH_OVERRIDE_FLAKY_SIGNAL=1 2023-01-11T21:11:58.5020944Z NV_LIBCUBLAS_DEV_PACKAGE=libcublas-dev-11-6=11.9.2.110-1 2023-01-11T21:11:58.5021246Z BRANCH=pull/91627 2023-01-11T21:11:58.5021504Z GITHUB_HEAD_REF=master 2023-01-11T21:11:58.5022060Z UCX_COMMIT=31e74cac7bee0ef66bef2af72e7d86d9c282e5ab 2023-01-11T21:11:58.5022400Z GITHUB_ACTOR=LucaLumetti 2023-01-11T21:11:58.5022714Z CMAKE_CUDA_COMPILER_LAUNCHER=/opt/cache/bin/sccache 2023-01-11T21:11:58.5022993Z GITHUB_ACTION_REF= 2023-01-11T21:11:58.5023385Z NCCL_VERSION=2.12.10-1 2023-01-11T21:11:58.5023657Z GITHUB_ACTION=__self 2023-01-11T21:11:58.5023883Z VALGRIND=ON 2023-01-11T21:11:58.5024138Z GITHUB_REF_PROTECTED=false 2023-01-11T21:11:58.5024591Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla 2023-01-11T21:11:58.5024957Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2023-01-11T21:11:58.5025320Z *** 2023-01-11T21:11:58.5025551Z INSTALLED_VISION=yes 2023-01-11T21:11:58.5025804Z NVARCH=x86_64 2023-01-11T21:11:58.5026085Z NV_LIBCUSPARSE_DEV_VERSION=11.7.2.124-1 2023-01-11T21:11:58.5026365Z HOME=/var/lib/jenkins 2023-01-11T21:11:58.5026885Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_b67d5d81-a6e3-4068-8079-7402625cf872 2023-01-11T21:11:58.5027266Z CARGO_NET_GIT_FETCH_WITH_CLI=true 2023-01-11T21:11:58.5027552Z GITHUB_ACTION_REPOSITORY= 2023-01-11T21:11:58.5027816Z GITHUB_REF_TYPE=branch 2023-01-11T21:11:58.5028106Z NV_LIBNCCL_PACKAGE_VERSION=2.12.10-1 2023-01-11T21:11:58.5028387Z GITHUB_RETENTION_DAYS=90 2023-01-11T21:11:58.5028772Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2 2023-01-11T21:11:58.5029467Z NV_LIBNCCL_PACKAGE=libnccl2=2.12.10-1+cuda11.6 2023-01-11T21:11:58.5030018Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_b67d5d81-a6e3-4068-8079-7402625cf872 2023-01-11T21:11:58.5030415Z DEBIAN_FRONTEND=noninteractive 2023-01-11T21:11:58.5030746Z NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev 2023-01-11T21:11:58.5031046Z GITHUB_REF=refs/pull/91627/merge 2023-01-11T21:11:58.5031348Z NV_CUDA_LIB_VERSION=11.6.2-1 2023-01-11T21:11:58.5031662Z GITHUB_SHA=57fc38f02f250896a12b32cfa200a6105a03d09c 2023-01-11T21:11:58.5031942Z INSTALLED_PROTOBUF=yes 2023-01-11T21:11:58.5032214Z GITHUB_REPOSITORY_ID=65600975 2023-01-11T21:11:58.5032481Z GITHUB_RUN_ID=3896099317 2023-01-11T21:11:58.5032816Z NV_LIBNPP_PACKAGE=libnpp-11-6=11.6.3.124-1 2023-01-11T21:11:58.5033120Z NV_LIBNCCL_PACKAGE_NAME=libnccl2 2023-01-11T21:11:58.5033417Z LIBRARY_PATH=/usr/local/cuda/lib64/stubs 2023-01-11T21:11:58.5033710Z NV_NVTX_VERSION=11.6.124-1 2023-01-11T21:11:58.5033980Z CONTINUE_THROUGH_ERROR=False 2023-01-11T21:11:58.5034286Z GITHUB_SERVER_URL=https://github.com 2023-01-11T21:11:58.5034539Z MAX_JOBS=30 2023-01-11T21:11:58.5034779Z GITHUB_ACTOR_ID=7543386 2023-01-11T21:11:58.5035074Z NV_LIBCUBLAS_VERSION=11.9.2.110-1 2023-01-11T21:11:58.5035428Z NV_LIBCUBLAS_PACKAGE=libcublas-11-6=11.9.2.110-1 2023-01-11T21:11:58.5035912Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json 2023-01-11T21:11:58.5036256Z UCX_HOME=/usr 2023-01-11T21:11:58.5036490Z PYTORCH_RETRY_TEST_CASES=1 2023-01-11T21:11:58.5036816Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2023-01-11T21:11:58.5037164Z BASE_SHA=db2a237763eb8693a20788be94f8c192e762baa8 2023-01-11T21:11:58.5037481Z NV_CUDA_CUDART_DEV_VERSION=11.6.55-1 2023-01-11T21:11:58.5037895Z PR_BODY=Fixes #91003 cc @ezyang @gchanan 2023-01-11T21:11:58.5038173Z GITHUB_BASE_REF=master 2023-01-11T21:11:58.5038399Z TERM=xterm 2023-01-11T21:11:58.5038655Z TORCH_INDUCTOR_INSTALL_GXX=ON 2023-01-11T21:11:58.5038912Z XLA_CUDA= 2023-01-11T21:11:58.5039169Z NV_NVML_DEV_VERSION=11.6.55-1 2023-01-11T21:11:58.5039442Z TORCH_CUDA_ARCH_LIST=Maxwell 2023-01-11T21:11:58.5039702Z CUDA_VERSION=11.6.2 2023-01-11T21:11:58.5040042Z NV_LIBCUBLAS_PACKAGE_NAME=libcublas-11-6 2023-01-11T21:11:58.5040320Z OPENSSL_ROOT_DIR=/opt/openssl 2023-01-11T21:11:58.5040847Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_b67d5d81-a6e3-4068-8079-7402625cf872 2023-01-11T21:11:58.5041223Z GITHUB_JOB=test 2023-01-11T21:11:58.5041459Z SCCACHE_S3_KEY_PREFIX=pull 2023-01-11T21:11:58.5042060Z COMMIT_MESSAGES=+ 52a16ce42647731c772e14e7175afa40fda07b3d make torchgen rename also Number arguments into input+ 87db01a53ecb702267ec36787654e418a52f8e93 fix torch.where signature mismatch+ 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e other instead of output in documentation 2023-01-11T21:11:58.5042883Z NVIDIA_DRIVER_CAPABILITIES=compute,utility 2023-01-11T21:11:58.5043299Z NUM_TEST_SHARDS=3 2023-01-11T21:11:58.5043607Z PR_NUMBER=91627 2023-01-11T21:11:58.5044149Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_b67d5d81-a6e3-4068-8079-7402625cf872 2023-01-11T21:11:58.5044516Z SHLVL=1 2023-01-11T21:11:58.5044841Z NV_LIBCUBLAS_DEV_PACKAGE_NAME=libcublas-dev-11-6 2023-01-11T21:11:58.5045171Z GITHUB_REPOSITORY=pytorch/pytorch 2023-01-11T21:11:58.5045960Z NVIDIA_REQUIRE_CUDA=cuda>=11.6 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=450,driver<451 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 2023-01-11T21:11:58.5046729Z NV_LIBNPP_DEV_VERSION=11.6.3.124-1 2023-01-11T21:11:58.5047032Z SHA1=8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:11:58.5047341Z GITHUB_EVENT_NAME=pull_request 2023-01-11T21:11:58.5047650Z NV_CUDA_CUDART_VERSION=11.6.55-1 2023-01-11T21:11:58.5047996Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all 2023-01-11T21:11:58.5048311Z GITHUB_RUN_NUMBER=77928 2023-01-11T21:11:58.5048574Z GITHUB_WORKFLOW=pull 2023-01-11T21:11:58.5048988Z PATH=/opt/cache/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2023-01-11T21:11:58.5049430Z NV_LIBNCCL_DEV_PACKAGE_VERSION=2.12.10-1 2023-01-11T21:11:58.5049787Z GITHUB_WORKFLOW_SHA=57fc38f02f250896a12b32cfa200a6105a03d09c 2023-01-11T21:11:58.5050275Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch 2023-01-11T21:11:58.5050625Z GITHUB_TRIGGERING_ACTOR=albanD 2023-01-11T21:11:58.5050900Z _=/usr/bin/env 2023-01-11T21:11:58.5051203Z + echo 'Testing pytorch' 2023-01-11T21:11:58.5051450Z Testing pytorch 2023-01-11T21:11:58.5051734Z + export LANG=C.UTF-8 2023-01-11T21:11:58.5052012Z + LANG=C.UTF-8 2023-01-11T21:11:58.5052247Z + PR_NUMBER=91627 2023-01-11T21:11:58.5052526Z + [[ distributed == \d\e\f\a\u\l\t ]] 2023-01-11T21:11:58.5052841Z + [[ distributed == \d\i\s\t\r\i\b\u\t\e\d ]] 2023-01-11T21:11:58.5053245Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 == *rocm* ]] 2023-01-11T21:11:58.5053573Z + [[ distributed == \s\l\o\w ]] 2023-01-11T21:11:58.5054003Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 == *slow-gradcheck* ]] 2023-01-11T21:11:58.5054468Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 == *cuda* ]] 2023-01-11T21:11:58.5054817Z + export PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2023-01-11T21:11:58.5055150Z + PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2023-01-11T21:11:58.5055458Z + [[ distributed == *crossref* ]] 2023-01-11T21:11:58.5055731Z + [[ distributed == *dynamo* ]] 2023-01-11T21:11:58.5056097Z + [[ distributed == *inductor* ]] 2023-01-11T21:11:58.5056513Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 == *rocm* ]] 2023-01-11T21:11:58.5057187Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 != *-bazel-* ]] 2023-01-11T21:11:58.5057606Z + pip_install --user ninja==1.10.2 2023-01-11T21:11:58.5058023Z + pip install --progress-bar off --user ninja==1.10.2 2023-01-11T21:11:59.0512274Z Collecting ninja==1.10.2 2023-01-11T21:11:59.0730529Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB) 2023-01-11T21:11:59.9611997Z Installing collected packages: ninja 2023-01-11T21:11:59.9712168Z  WARNING: The script ninja is installed in '/var/lib/jenkins/.local/bin' which is not on PATH. 2023-01-11T21:11:59.9712848Z Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. 2023-01-11T21:11:59.9763826Z Successfully installed ninja-1.10.2 2023-01-11T21:12:00.0418603Z + export PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2023-01-11T21:12:00.0419637Z + PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2023-01-11T21:12:00.0420343Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 == *asan* ]] 2023-01-11T21:12:00.0420767Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 == *-tsan* ]] 2023-01-11T21:12:00.0421117Z + [[ distributed == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]] 2023-01-11T21:12:00.0421450Z + [[ distributed == \n\o\g\p\u\_\A\V\X\5\1\2 ]] 2023-01-11T21:12:00.0430878Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 == *tbb* ]] 2023-01-11T21:12:00.0446405Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 == *libtorch* ]] 2023-01-11T21:12:00.0446935Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 == *-bazel-* ]] 2023-01-11T21:12:00.0447373Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 == *-tsan* ]] 2023-01-11T21:12:00.0449674Z + cd test 2023-01-11T21:12:00.0450245Z + python -c 'import torch; print(torch.__config__.show())' 2023-01-11T21:12:01.6678748Z PyTorch built with: 2023-01-11T21:12:01.6679183Z - GCC 7.5 2023-01-11T21:12:01.6679519Z - C++ Version: 201703 2023-01-11T21:12:01.6680076Z - Intel(R) oneAPI Math Kernel Library Version 2022.0-Product Build 20211112 for Intel(R) 64 architecture applications 2023-01-11T21:12:01.6680622Z - Intel(R) MKL-DNN v2.7.2 (Git Hash fbec3e25a559ee252022ae066817b204e106a6ba) 2023-01-11T21:12:01.6681025Z - OpenMP 201511 (a.k.a. OpenMP 4.5) 2023-01-11T21:12:01.6681396Z - LAPACK is enabled (usually provided by MKL) 2023-01-11T21:12:01.6681703Z - NNPACK is enabled 2023-01-11T21:12:01.6682012Z - CPU capability usage: AVX2 2023-01-11T21:12:01.6682314Z - CUDA Runtime 11.6 2023-01-11T21:12:01.6682688Z - NVCC architecture flags: -gencode;arch=compute_52,code=sm_52 2023-01-11T21:12:01.6683087Z - CuDNN 8.3.2 (built against CUDA 11.5) 2023-01-11T21:12:01.6683387Z - Magma 2.6.1 2023-01-11T21:12:01.6686477Z - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/cache/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Werror -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 2023-01-11T21:12:01.6689093Z 2023-01-11T21:12:01.9042686Z + cd test 2023-01-11T21:12:01.9043213Z + python -c 'import torch; print(torch.__config__.parallel_info())' 2023-01-11T21:12:03.4902207Z ATen/Parallel: 2023-01-11T21:12:03.4902545Z at::get_num_threads() : 16 2023-01-11T21:12:03.4902836Z at::get_num_interop_threads() : 16 2023-01-11T21:12:03.4903131Z OpenMP 201511 (a.k.a. OpenMP 4.5) 2023-01-11T21:12:03.4903409Z omp_get_max_threads() : 16 2023-01-11T21:12:03.4904053Z Intel(R) oneAPI Math Kernel Library Version 2022.0-Product Build 20211112 for Intel(R) 64 architecture applications 2023-01-11T21:12:03.4904423Z mkl_get_max_threads() : 16 2023-01-11T21:12:03.4904859Z Intel(R) MKL-DNN v2.7.2 (Git Hash fbec3e25a559ee252022ae066817b204e106a6ba) 2023-01-11T21:12:03.4905242Z std::thread::hardware_concurrency() : 32 2023-01-11T21:12:03.4905513Z Environment variables: 2023-01-11T21:12:03.4905781Z OMP_NUM_THREADS : [not set] 2023-01-11T21:12:03.4906339Z MKL_NUM_THREADS : [not set] 2023-01-11T21:12:03.4906616Z ATen parallel backend: OpenMP 2023-01-11T21:12:03.4906797Z 2023-01-11T21:12:03.7222021Z + [[ distributed == *backward* ]] 2023-01-11T21:12:03.7222335Z + [[ distributed == *xla* ]] 2023-01-11T21:12:03.7222640Z + [[ distributed == \j\i\t\_\l\e\g\a\c\y ]] 2023-01-11T21:12:03.7223202Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 == *libtorch* ]] 2023-01-11T21:12:03.7223531Z + [[ distributed == distributed ]] 2023-01-11T21:12:03.7223801Z + install_filelock 2023-01-11T21:12:03.7224059Z + pip_install filelock 2023-01-11T21:12:03.7224389Z + pip install --progress-bar off filelock 2023-01-11T21:12:04.2405105Z Collecting filelock 2023-01-11T21:12:04.2618513Z Downloading filelock-3.9.0-py3-none-any.whl (9.7 kB) 2023-01-11T21:12:05.1691538Z Installing collected packages: filelock 2023-01-11T21:12:05.2046753Z Successfully installed filelock-3.9.0 2023-01-11T21:12:05.2684900Z + install_triton 2023-01-11T21:12:05.2685166Z + local commit 2023-01-11T21:12:05.2685450Z + [[ distributed == *rocm* ]] 2023-01-11T21:12:05.2688783Z ++ get_pinned_commit triton 2023-01-11T21:12:05.2689308Z ++ cat .github/ci_commit_pins/triton.txt 2023-01-11T21:12:05.2704258Z + commit=0d7e7532279e45672555e344646f5c19c3972331 2023-01-11T21:12:05.2704939Z + pip_install --user git+https://github.com/openai/triton@0d7e7532279e45672555e344646f5c19c3972331#subdirectory=python 2023-01-11T21:12:05.2705937Z + pip install --progress-bar off --user git+https://github.com/openai/triton@0d7e7532279e45672555e344646f5c19c3972331#subdirectory=python 2023-01-11T21:12:05.7286762Z Collecting git+https://github.com/openai/triton@0d7e7532279e45672555e344646f5c19c3972331#subdirectory=python 2023-01-11T21:12:05.7292825Z Cloning https://github.com/openai/triton (to revision 0d7e7532279e45672555e344646f5c19c3972331) to /tmp/pip-req-build-w9ss0w7a 2023-01-11T21:12:05.7313417Z Running command git clone --filter=blob:none --quiet https://github.com/openai/triton /tmp/pip-req-build-w9ss0w7a 2023-01-11T21:12:06.5224551Z Running command git rev-parse -q --verify 'sha^0d7e7532279e45672555e344646f5c19c3972331' 2023-01-11T21:12:06.5245114Z Running command git fetch -q https://github.com/openai/triton 0d7e7532279e45672555e344646f5c19c3972331 2023-01-11T21:12:06.9529125Z Running command git checkout -q 0d7e7532279e45672555e344646f5c19c3972331 2023-01-11T21:12:07.4484370Z Resolved https://github.com/openai/triton to commit 0d7e7532279e45672555e344646f5c19c3972331 2023-01-11T21:12:07.4485574Z Running command git submodule update --init --recursive -q 2023-01-11T21:12:08.0477276Z Preparing metadata (setup.py) ... [?25l- done 2023-01-11T21:12:08.2551539Z [?25hCollecting cmake 2023-01-11T21:12:08.2795593Z Downloading cmake-3.25.0-py2.py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (23.7 MB) 2023-01-11T21:12:08.6117161Z Requirement already satisfied: filelock in /opt/conda/lib/python3.10/site-packages (from triton==2.0.0) (3.9.0) 2023-01-11T21:12:08.6121243Z Requirement already satisfied: torch in /opt/conda/lib/python3.10/site-packages (from triton==2.0.0) (2.0.0a0+git8419ddd) 2023-01-11T21:12:08.6374580Z Requirement already satisfied: sympy in /opt/conda/lib/python3.10/site-packages (from torch->triton==2.0.0) (1.11.1) 2023-01-11T21:12:08.6380205Z Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.10/site-packages (from torch->triton==2.0.0) (4.4.0) 2023-01-11T21:12:08.6385117Z Requirement already satisfied: networkx in /opt/conda/lib/python3.10/site-packages (from torch->triton==2.0.0) (2.6.3) 2023-01-11T21:12:08.6596750Z Requirement already satisfied: mpmath>=0.19 in /opt/conda/lib/python3.10/site-packages (from sympy->torch->triton==2.0.0) (1.2.1) 2023-01-11T21:12:08.6668193Z Building wheels for collected packages: triton 2023-01-11T21:13:02.1742453Z Building wheel for triton (setup.py) ... [?25l- \ | / - \ | / - \ | done 2023-01-11T21:13:02.2226567Z [?25h Created wheel for triton: filename=triton-2.0.0-cp310-cp310-linux_x86_64.whl size=15377935 sha256=50776f5cf7bdf8957ec8ff317f6a6d8778149e2157845342633a1ac7a393547b 2023-01-11T21:13:02.2229583Z Stored in directory: /var/lib/jenkins/.cache/pip/wheels/3f/1d/23/1c2bc47d618a44f9c949aea4b7e355e737a1f1ed208f009295 2023-01-11T21:13:02.2249025Z Successfully built triton 2023-01-11T21:13:03.1157326Z Installing collected packages: cmake, triton 2023-01-11T21:13:04.7430300Z Successfully installed cmake-3.25.0 triton-2.0.0 2023-01-11T21:13:04.8460854Z + pip_install --user jinja2 2023-01-11T21:13:04.8461300Z + pip install --progress-bar off --user jinja2 2023-01-11T21:13:05.8878088Z Collecting jinja2 2023-01-11T21:13:05.9098971Z Downloading Jinja2-3.1.2-py3-none-any.whl (133 kB) 2023-01-11T21:13:06.1780193Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.10/site-packages (from jinja2) (2.1.1) 2023-01-11T21:13:07.0817602Z Installing collected packages: jinja2 2023-01-11T21:13:07.1848124Z Successfully installed jinja2-3.1.2 2023-01-11T21:13:07.2516014Z + test_distributed 2023-01-11T21:13:07.2516848Z + echo 'Testing distributed python tests' 2023-01-11T21:13:07.2517444Z Testing distributed python tests 2023-01-11T21:13:07.2518242Z + python test/run_test.py --distributed-tests --shard 3 3 --verbose 2023-01-11T21:13:09.4528453Z Ignoring disabled issues: ['91003'] 2023-01-11T21:13:09.4916715Z /var/lib/jenkins/workspace/test/run_test.py:1169: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. 2023-01-11T21:13:09.4917291Z if torch.version.cuda is not None and LooseVersion(torch.version.cuda) >= "11.6": 2023-01-11T21:13:09.4922488Z Found test time stats from artifacts 2023-01-11T21:13:09.4941453Z Selected tests: 2023-01-11T21:13:09.4942020Z distributed/algorithms/quantization/test_quantization 2023-01-11T21:13:09.4942372Z distributed/test_distributed_spawn 2023-01-11T21:13:09.4945385Z distributed/pipeline/sync/test_worker 2023-01-11T21:13:09.4945763Z distributed/pipeline/sync/test_pipeline 2023-01-11T21:13:09.4946105Z distributed/pipeline/sync/test_microbatch 2023-01-11T21:13:09.4946447Z distributed/pipeline/sync/test_deferred_batch_norm 2023-01-11T21:13:09.4946776Z distributed/pipeline/sync/test_bugs 2023-01-11T21:13:09.4947099Z distributed/pipeline/sync/skip/test_tracker 2023-01-11T21:13:09.4947405Z distributed/pipeline/sync/skip/test_leak 2023-01-11T21:13:09.4948301Z distributed/pipeline/sync/skip/test_api 2023-01-11T21:13:09.4948621Z distributed/fsdp/test_shard_utils 2023-01-11T21:13:09.4948954Z distributed/checkpoint/test_2d_model_state_checkpoint 2023-01-11T21:13:09.4949302Z distributed/_shard/sharded_tensor/ops/test_math_ops 2023-01-11T21:13:09.4949635Z distributed/elastic/metrics/api_test 2023-01-11T21:13:09.4952094Z distributed/checkpoint/test_utils 2023-01-11T21:13:09.4952402Z distributed/checkpoint/test_nested_dict 2023-01-11T21:13:09.4952970Z distributed/elastic/utils/logging_test 2023-01-11T21:13:09.4953276Z distributed/elastic/utils/util_test 2023-01-11T21:13:09.4953562Z distributed/test_multi_threaded_pg 2023-01-11T21:13:09.4953864Z distributed/rpc/test_share_memory 2023-01-11T21:13:09.4954191Z distributed/elastic/utils/distributed_test 2023-01-11T21:13:09.4954504Z distributed/elastic/timer/local_timer_test 2023-01-11T21:13:09.4954829Z distributed/fsdp/test_fsdp_multiple_forward 2023-01-11T21:13:09.4955178Z distributed/_shard/sharded_tensor/ops/test_softmax 2023-01-11T21:13:09.4955546Z distributed/_shard/sharded_tensor/ops/test_embedding 2023-01-11T21:13:09.4955851Z distributed/test_c10d_error_logger 2023-01-11T21:13:09.4956175Z distributed/_shard/sharded_tensor/ops/test_linear 2023-01-11T21:13:09.4956502Z distributed/fsdp/test_fsdp_pure_fp16 2023-01-11T21:13:09.4956839Z distributed/_shard/sharded_tensor/ops/test_elementwise_ops 2023-01-11T21:13:09.4957206Z distributed/_shard/sharding_plan/test_sharding_plan 2023-01-11T21:13:09.4957527Z distributed/_tensor/test_api 2023-01-11T21:13:09.4957809Z distributed/_composable/test_replicate 2023-01-11T21:13:09.4958152Z distributed/tensor/parallel/test_parallelize_api 2023-01-11T21:13:09.4958594Z distributed/fsdp/test_fsdp_tp_integration 2023-01-11T21:13:09.4958911Z distributed/checkpoint/test_checkpoint 2023-01-11T21:13:09.4959243Z distributed/tensor/parallel/test_tp_style 2023-01-11T21:13:09.4959588Z distributed/_shard/sharded_tensor/ops/test_matrix_ops 2023-01-11T21:13:09.4959894Z distributed/_tensor/test_matrix_ops 2023-01-11T21:13:09.4960210Z distributed/fsdp/test_fsdp_flatten_params 2023-01-11T21:13:09.4960515Z distributed/test_c10d_common 2023-01-11T21:13:09.4960800Z distributed/fsdp/test_fsdp_comm 2023-01-11T21:13:09.4961103Z distributed/fsdp/test_fsdp_freezing_weights 2023-01-11T21:13:09.4961420Z distributed/_tensor/test_device_mesh 2023-01-11T21:13:09.4961728Z distributed/fsdp/test_fsdp_grad_acc 2023-01-11T21:13:09.4962011Z distributed/fsdp/test_fsdp_misc 2023-01-11T21:13:09.4962300Z distributed/fsdp/test_wrap 2023-01-11T21:13:09.4962618Z distributed/optim/test_zero_redundancy_optimizer 2023-01-11T21:13:09.4962930Z distributed/fsdp/test_fsdp_optim_state 2023-01-11T21:13:09.4963221Z distributed/test_c10d_gloo 2023-01-11T21:13:09.4963505Z distributed/fsdp/test_fsdp_core 2023-01-11T21:13:09.5108586Z Prioritized test from test file changes. 2023-01-11T21:13:09.5108906Z reordering tests for PR: 2023-01-11T21:13:09.5109594Z prioritized: ['distributed/checkpoint/test_2d_model_state_checkpoint', 'distributed/_tensor/test_device_mesh', 'distributed/fsdp/test_fsdp_optim_state'] 2023-01-11T21:13:09.5113948Z the rest: ['distributed/algorithms/quantization/test_quantization', 'distributed/test_distributed_spawn', 'distributed/pipeline/sync/test_worker', 'distributed/pipeline/sync/test_pipeline', 'distributed/pipeline/sync/test_microbatch', 'distributed/pipeline/sync/test_deferred_batch_norm', 'distributed/pipeline/sync/test_bugs', 'distributed/pipeline/sync/skip/test_tracker', 'distributed/pipeline/sync/skip/test_leak', 'distributed/pipeline/sync/skip/test_api', 'distributed/fsdp/test_shard_utils', 'distributed/_shard/sharded_tensor/ops/test_math_ops', 'distributed/elastic/metrics/api_test', 'distributed/checkpoint/test_utils', 'distributed/checkpoint/test_nested_dict', 'distributed/elastic/utils/logging_test', 'distributed/elastic/utils/util_test', 'distributed/test_multi_threaded_pg', 'distributed/rpc/test_share_memory', 'distributed/elastic/utils/distributed_test', 'distributed/elastic/timer/local_timer_test', 'distributed/fsdp/test_fsdp_multiple_forward', 'distributed/_shard/sharded_tensor/ops/test_softmax', 'distributed/_shard/sharded_tensor/ops/test_embedding', 'distributed/test_c10d_error_logger', 'distributed/_shard/sharded_tensor/ops/test_linear', 'distributed/fsdp/test_fsdp_pure_fp16', 'distributed/_shard/sharded_tensor/ops/test_elementwise_ops', 'distributed/_shard/sharding_plan/test_sharding_plan', 'distributed/_tensor/test_api', 'distributed/_composable/test_replicate', 'distributed/tensor/parallel/test_parallelize_api', 'distributed/fsdp/test_fsdp_tp_integration', 'distributed/checkpoint/test_checkpoint', 'distributed/tensor/parallel/test_tp_style', 'distributed/_shard/sharded_tensor/ops/test_matrix_ops', 'distributed/_tensor/test_matrix_ops', 'distributed/fsdp/test_fsdp_flatten_params', 'distributed/test_c10d_common', 'distributed/fsdp/test_fsdp_comm', 'distributed/fsdp/test_fsdp_freezing_weights', 'distributed/fsdp/test_fsdp_grad_acc', 'distributed/fsdp/test_fsdp_misc', 'distributed/fsdp/test_wrap', 'distributed/optim/test_zero_redundancy_optimizer', 'distributed/test_c10d_gloo', 'distributed/fsdp/test_fsdp_core'] 2023-01-11T21:13:09.5116957Z 2023-01-11T21:13:09.5117506Z Downloading https://raw.githubusercontent.com/pytorch/test-infra/generated-stats/stats/slow-tests.json to /var/lib/jenkins/workspace/test/.pytorch-slow-tests.json 2023-01-11T21:13:09.5348026Z Downloading https://raw.githubusercontent.com/pytorch/test-infra/generated-stats/stats/disabled-tests-condensed.json to /var/lib/jenkins/workspace/test/.pytorch-disabled-tests.json 2023-01-11T21:13:09.5541775Z parallel (file granularity) tests: 2023-01-11T21:13:09.5542063Z 2023-01-11T21:13:09.5542441Z serial (file granularity) tests: 2023-01-11T21:13:09.5542823Z distributed/checkpoint/test_2d_model_state_checkpoint 2023-01-11T21:13:09.5543158Z distributed/_tensor/test_device_mesh 2023-01-11T21:13:09.5543454Z distributed/fsdp/test_fsdp_optim_state 2023-01-11T21:13:09.5543799Z distributed/algorithms/quantization/test_quantization 2023-01-11T21:13:09.5544133Z distributed/test_distributed_spawn 2023-01-11T21:13:09.5544431Z distributed/pipeline/sync/test_worker 2023-01-11T21:13:09.5544752Z distributed/pipeline/sync/test_pipeline 2023-01-11T21:13:09.5545076Z distributed/pipeline/sync/test_microbatch 2023-01-11T21:13:09.5545400Z distributed/pipeline/sync/test_deferred_batch_norm 2023-01-11T21:13:09.5545725Z distributed/pipeline/sync/test_bugs 2023-01-11T21:13:09.5546048Z distributed/pipeline/sync/skip/test_tracker 2023-01-11T21:13:09.5546403Z distributed/pipeline/sync/skip/test_leak 2023-01-11T21:13:09.5546703Z distributed/pipeline/sync/skip/test_api 2023-01-11T21:13:09.5547008Z distributed/fsdp/test_shard_utils 2023-01-11T21:13:09.5547339Z distributed/_shard/sharded_tensor/ops/test_math_ops 2023-01-11T21:13:09.5547648Z distributed/elastic/metrics/api_test 2023-01-11T21:13:09.5547948Z distributed/checkpoint/test_utils 2023-01-11T21:13:09.5548255Z distributed/checkpoint/test_nested_dict 2023-01-11T21:13:09.5548544Z distributed/elastic/utils/logging_test 2023-01-11T21:13:09.5548846Z distributed/elastic/utils/util_test 2023-01-11T21:13:09.5549147Z distributed/test_multi_threaded_pg 2023-01-11T21:13:09.5549424Z distributed/rpc/test_share_memory 2023-01-11T21:13:09.5549736Z distributed/elastic/utils/distributed_test 2023-01-11T21:13:09.5550060Z distributed/elastic/timer/local_timer_test 2023-01-11T21:13:09.5550383Z distributed/fsdp/test_fsdp_multiple_forward 2023-01-11T21:13:09.5550704Z distributed/_shard/sharded_tensor/ops/test_softmax 2023-01-11T21:13:09.5551058Z distributed/_shard/sharded_tensor/ops/test_embedding 2023-01-11T21:13:09.5551379Z distributed/test_c10d_error_logger 2023-01-11T21:13:09.5551684Z distributed/_shard/sharded_tensor/ops/test_linear 2023-01-11T21:13:09.5552010Z distributed/fsdp/test_fsdp_pure_fp16 2023-01-11T21:13:09.5552357Z distributed/_shard/sharded_tensor/ops/test_elementwise_ops 2023-01-11T21:13:09.5552704Z distributed/_shard/sharding_plan/test_sharding_plan 2023-01-11T21:13:09.5553016Z distributed/_tensor/test_api 2023-01-11T21:13:09.5553315Z distributed/_composable/test_replicate 2023-01-11T21:13:09.5553632Z distributed/tensor/parallel/test_parallelize_api 2023-01-11T21:13:09.5553966Z distributed/fsdp/test_fsdp_tp_integration 2023-01-11T21:13:09.5554282Z distributed/checkpoint/test_checkpoint 2023-01-11T21:13:09.5554587Z distributed/tensor/parallel/test_tp_style 2023-01-11T21:13:09.5554928Z distributed/_shard/sharded_tensor/ops/test_matrix_ops 2023-01-11T21:13:09.5555359Z distributed/_tensor/test_matrix_ops 2023-01-11T21:13:09.5555674Z distributed/fsdp/test_fsdp_flatten_params 2023-01-11T21:13:09.5555951Z distributed/test_c10d_common 2023-01-11T21:13:09.5556240Z distributed/fsdp/test_fsdp_comm 2023-01-11T21:13:09.5556554Z distributed/fsdp/test_fsdp_freezing_weights 2023-01-11T21:13:09.5556847Z distributed/fsdp/test_fsdp_grad_acc 2023-01-11T21:13:09.5557142Z distributed/fsdp/test_fsdp_misc 2023-01-11T21:13:09.5557421Z distributed/fsdp/test_wrap 2023-01-11T21:13:09.5557720Z distributed/optim/test_zero_redundancy_optimizer 2023-01-11T21:13:09.5558026Z distributed/test_c10d_gloo 2023-01-11T21:13:09.5558307Z distributed/fsdp/test_fsdp_core 2023-01-11T21:13:11.7627398Z Ignoring disabled issues: ['91003'] 2023-01-11T21:13:11.7655811Z Ignoring disabled issues: ['91003'] 2023-01-11T21:13:12.1896232Z Running distributed/checkpoint/test_2d_model_state_checkpoint ... [2023-01-11 21:13:12.189118] 2023-01-11T21:13:12.1901199Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/checkpoint/test_2d_model_state_checkpoint.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:13:12.189741] 2023-01-11T21:13:18.4576824Z 2023-01-11T21:13:18.4578391Z Expand the folded group to see the log file of distributed/checkpoint/test_2d_model_state_checkpoint 2023-01-11T21:13:18.4585713Z ##[group]PRINTING LOG FILE of distributed/checkpoint/test_2d_model_state_checkpoint (/var/lib/jenkins/workspace/test/test-reports/distributed-checkpoint-test_2d_model_state_checkpoint_9kzd_apa) 2023-01-11T21:13:18.4586180Z 2023-01-11T21:13:18.4586302Z Running tests... 2023-01-11T21:13:18.4586800Z ---------------------------------------------------------------------- 2023-01-11T21:13:18.4587414Z Test results will be stored in test-reports/python-unittest/distributed.checkpoint.test_2d_model_state_checkpoint 2023-01-11T21:13:18.4587980Z test_2d_model_state_checkpoint (__main__.Test2dModelStateCheckpoint) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:13:18.4588487Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 900 2023-01-11T21:13:18.4588906Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 901 2023-01-11T21:13:18.4589517Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:13:18.4590038Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:13:18.4590598Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:13:18.4591062Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:13:18.4591636Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:13:18.4592075Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:13:18.4592623Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:13:18.4593087Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:13:18.4593524Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:13:18.4593974Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:13:18.4594458Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:13:18.4594947Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:13:18.4595600Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:13:18.4596262Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:13:18.4596703Z skip: Need at least 4 CUDA devices (3.927s) 2023-01-11T21:13:18.4597090Z 2023-01-11T21:13:18.4597369Z ---------------------------------------------------------------------- 2023-01-11T21:13:18.4597698Z Ran 1 test in 3.927s 2023-01-11T21:13:18.4597842Z 2023-01-11T21:13:18.4597956Z OK (skipped=1) 2023-01-11T21:13:18.4598112Z 2023-01-11T21:13:18.4598239Z Generating XML reports... 2023-01-11T21:13:18.4598913Z Generated XML report: test-reports/python-unittest/distributed.checkpoint.test_2d_model_state_checkpoint/TEST-Test2dModelStateCheckpoint-20230111211314.xml 2023-01-11T21:13:18.4599322Z 2023-01-11T21:13:18.4599631Z ##[endgroup] 2023-01-11T21:13:18.4600317Z FINISHED PRINTING LOG FILE of distributed/checkpoint/test_2d_model_state_checkpoint (/var/lib/jenkins/workspace/test/test-reports/distributed-checkpoint-test_2d_model_state_checkpoint_9kzd_apa) 2023-01-11T21:13:18.4600730Z 2023-01-11T21:13:18.4601003Z Running distributed/_tensor/test_device_mesh ... [2023-01-11 21:13:18.457775] 2023-01-11T21:13:18.4601682Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_tensor/test_device_mesh.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:13:18.458048] 2023-01-11T21:14:10.5667593Z 2023-01-11T21:14:10.5668711Z Expand the folded group to see the log file of distributed/_tensor/test_device_mesh 2023-01-11T21:14:10.5670021Z ##[group]PRINTING LOG FILE of distributed/_tensor/test_device_mesh (/var/lib/jenkins/workspace/test/test-reports/distributed-_tensor-test_device_mesh_px3qazu8) 2023-01-11T21:14:10.5672235Z 2023-01-11T21:14:10.5672502Z Running tests... 2023-01-11T21:14:10.5673282Z ---------------------------------------------------------------------- 2023-01-11T21:14:10.5673861Z Test results will be stored in test-reports/python-unittest/distributed._tensor.test_device_mesh 2023-01-11T21:14:10.5674631Z test_all_gather_1d (__main__.DeviceMeshCollectiveTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:14:10.5675109Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 1005 2023-01-11T21:14:10.5675821Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 1006 2023-01-11T21:14:10.5676273Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 1007 2023-01-11T21:14:10.5676770Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 1008 2023-01-11T21:14:10.5677384Z INFO:torch.testing._internal.common_distributed:Started process 4 with pid 1009 2023-01-11T21:14:10.5678363Z INFO:torch.testing._internal.common_distributed:Started process 5 with pid 1010 2023-01-11T21:14:10.5678882Z INFO:torch.testing._internal.common_distributed:Started process 6 with pid 1011 2023-01-11T21:14:10.5679308Z INFO:torch.testing._internal.common_distributed:Started process 7 with pid 1012 2023-01-11T21:14:10.5680209Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5680678Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5681526Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5681985Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5682851Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5683307Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5684148Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5684601Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5685447Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5685905Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5686769Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5687399Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5688279Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5688728Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5689555Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5690024Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5690856Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5691356Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5692181Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5692664Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5693598Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5694068Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5694897Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5695377Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5696221Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5697022Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5697899Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5698377Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5699233Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5699663Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5700507Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5700980Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5701659Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 7 2023-01-11T21:14:10.5702126Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 6 2023-01-11T21:14:10.5702667Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:14:10.5703309Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 4 2023-01-11T21:14:10.5703750Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T21:14:10.5704480Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T21:14:10.5704954Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:14:10.5705635Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 5 2023-01-11T21:14:10.5706040Z skip: Need at least 8 CUDA devices (4.163s) 2023-01-11T21:14:10.5706526Z test_all_gather_nd (__main__.DeviceMeshCollectiveTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 1277 2023-01-11T21:14:10.5707311Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 1278 2023-01-11T21:14:10.5707760Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 1279 2023-01-11T21:14:10.5708430Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 1280 2023-01-11T21:14:10.5709021Z INFO:torch.testing._internal.common_distributed:Started process 4 with pid 1281 2023-01-11T21:14:10.5709682Z INFO:torch.testing._internal.common_distributed:Started process 5 with pid 1282 2023-01-11T21:14:10.5710134Z INFO:torch.testing._internal.common_distributed:Started process 6 with pid 1283 2023-01-11T21:14:10.5710568Z INFO:torch.testing._internal.common_distributed:Started process 7 with pid 1284 2023-01-11T21:14:10.5711443Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5711896Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5712723Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5713199Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5714034Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5714485Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5715409Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5715890Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5716743Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5717173Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5718004Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5718468Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5719293Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5719726Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5720523Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5721412Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5722258Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5722680Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5723484Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5723926Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5724772Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5725245Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5726130Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5726608Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5727418Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5727881Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5728709Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5729200Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5730013Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5730455Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5731359Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5731848Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5732270Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:14:10.5732968Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 6 2023-01-11T21:14:10.5733458Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 5 2023-01-11T21:14:10.5733933Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:14:10.5734382Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 7 2023-01-11T21:14:10.5734843Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T21:14:10.5735571Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T21:14:10.5736021Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 4 2023-01-11T21:14:10.5736515Z skip: Need at least 8 CUDA devices (2.614s) 2023-01-11T21:14:10.5737373Z test_all_gather_uneven (__main__.DeviceMeshCollectiveTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 1549 2023-01-11T21:14:10.5737922Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 1550 2023-01-11T21:14:10.5738352Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 1551 2023-01-11T21:14:10.5738810Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 1552 2023-01-11T21:14:10.5739479Z INFO:torch.testing._internal.common_distributed:Started process 4 with pid 1553 2023-01-11T21:14:10.5739961Z INFO:torch.testing._internal.common_distributed:Started process 5 with pid 1554 2023-01-11T21:14:10.5740388Z INFO:torch.testing._internal.common_distributed:Started process 6 with pid 1555 2023-01-11T21:14:10.5740834Z INFO:torch.testing._internal.common_distributed:Started process 7 with pid 1556 2023-01-11T21:14:10.5741482Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5741939Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5742498Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5742973Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5743555Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5744006Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5744559Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5745033Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5745615Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5746048Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5746896Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5747370Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5747950Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5748378Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5748959Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5749573Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5750183Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5750606Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5751166Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5751625Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5752187Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5752606Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5753164Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5753615Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5754165Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5754673Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5755248Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5755704Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5756258Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5756705Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5757494Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5757983Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5758424Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T21:14:10.5758926Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:14:10.5759400Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T21:14:10.5759868Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 4 2023-01-11T21:14:10.5760604Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 7 2023-01-11T21:14:10.5761086Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:14:10.5761544Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 5 2023-01-11T21:14:10.5761984Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 6 2023-01-11T21:14:10.5762382Z skip: Need at least 8 CUDA devices (2.614s) 2023-01-11T21:14:10.5762868Z test_all_reduce_1d (__main__.DeviceMeshCollectiveTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 1821 2023-01-11T21:14:10.5763392Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 1822 2023-01-11T21:14:10.5763809Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 1823 2023-01-11T21:14:10.5764243Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 1824 2023-01-11T21:14:10.5764677Z INFO:torch.testing._internal.common_distributed:Started process 4 with pid 1825 2023-01-11T21:14:10.5765110Z INFO:torch.testing._internal.common_distributed:Started process 5 with pid 1826 2023-01-11T21:14:10.5765522Z INFO:torch.testing._internal.common_distributed:Started process 6 with pid 1827 2023-01-11T21:14:10.5765950Z INFO:torch.testing._internal.common_distributed:Started process 7 with pid 1828 2023-01-11T21:14:10.5766556Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5767076Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5767651Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5768109Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5768676Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5769091Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5769652Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5770108Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5770654Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5771091Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5771703Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5772163Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5772715Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5773152Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5773711Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5774167Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5774712Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5775158Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5775716Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5776153Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5777152Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5777603Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5778169Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5778601Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5779168Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5779606Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5780155Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5780610Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5781175Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5781609Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5782150Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5782606Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5783036Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:14:10.5783502Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T21:14:10.5784054Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:14:10.5784506Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 4 2023-01-11T21:14:10.5784968Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 5 2023-01-11T21:14:10.5785407Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T21:14:10.5785857Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 6 2023-01-11T21:14:10.5786310Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 7 2023-01-11T21:14:10.5786886Z skip: Need at least 8 CUDA devices (2.514s) 2023-01-11T21:14:10.5787461Z test_all_reduce_nd (__main__.DeviceMeshCollectiveTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2093 2023-01-11T21:14:10.5788039Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2094 2023-01-11T21:14:10.5788687Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 2095 2023-01-11T21:14:10.5789105Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 2096 2023-01-11T21:14:10.5789616Z INFO:torch.testing._internal.common_distributed:Started process 4 with pid 2097 2023-01-11T21:14:10.5790067Z INFO:torch.testing._internal.common_distributed:Started process 5 with pid 2098 2023-01-11T21:14:10.5790499Z INFO:torch.testing._internal.common_distributed:Started process 6 with pid 2099 2023-01-11T21:14:10.5790910Z INFO:torch.testing._internal.common_distributed:Started process 7 with pid 2100 2023-01-11T21:14:10.5791564Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5792014Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5792568Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5793041Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5793614Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5794053Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5794596Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5795059Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5795626Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5796040Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5796600Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5797066Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5797638Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5798056Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5798616Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5799068Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5799634Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5800048Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5800611Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5801061Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5801707Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5802148Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5802709Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5803163Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5803708Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5804143Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5804700Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5805148Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5805698Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5806826Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5807425Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5807864Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5808296Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 4 2023-01-11T21:14:10.5808763Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:14:10.5809227Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 5 2023-01-11T21:14:10.5809665Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:14:10.5810116Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 7 2023-01-11T21:14:10.5810581Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 6 2023-01-11T21:14:10.5811015Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T21:14:10.5811470Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T21:14:10.5811849Z skip: Need at least 8 CUDA devices (2.514s) 2023-01-11T21:14:10.5812319Z test_all_to_all_1d (__main__.DeviceMeshCollectiveTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2365 2023-01-11T21:14:10.5812819Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2366 2023-01-11T21:14:10.5813252Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 2367 2023-01-11T21:14:10.5813685Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 2368 2023-01-11T21:14:10.5814100Z INFO:torch.testing._internal.common_distributed:Started process 4 with pid 2369 2023-01-11T21:14:10.5814529Z INFO:torch.testing._internal.common_distributed:Started process 5 with pid 2370 2023-01-11T21:14:10.5814955Z INFO:torch.testing._internal.common_distributed:Started process 6 with pid 2371 2023-01-11T21:14:10.5815380Z INFO:torch.testing._internal.common_distributed:Started process 7 with pid 2372 2023-01-11T21:14:10.5815964Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5816409Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5817341Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5817790Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5818363Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5818916Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5819489Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5819928Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5820495Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5820932Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5821495Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5821928Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5822491Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5823186Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5823753Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5824292Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5824878Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5825315Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5825857Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5826317Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5826881Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5827293Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5827858Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5828316Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5828879Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5829294Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5829853Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5830307Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5830867Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5831278Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5831846Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5832298Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5832713Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 6 2023-01-11T21:14:10.5833185Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:14:10.5833636Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 4 2023-01-11T21:14:10.5834093Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 5 2023-01-11T21:14:10.5834534Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 7 2023-01-11T21:14:10.5834984Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T21:14:10.5835927Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T21:14:10.5836471Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:14:10.5836855Z skip: Need at least 8 CUDA devices (2.514s) 2023-01-11T21:14:10.5837330Z test_all_to_all_nd (__main__.DeviceMeshCollectiveTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2637 2023-01-11T21:14:10.5837851Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2638 2023-01-11T21:14:10.5838268Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 2639 2023-01-11T21:14:10.5838696Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 2640 2023-01-11T21:14:10.5839123Z INFO:torch.testing._internal.common_distributed:Started process 4 with pid 2641 2023-01-11T21:14:10.5839532Z INFO:torch.testing._internal.common_distributed:Started process 5 with pid 2642 2023-01-11T21:14:10.5839960Z INFO:torch.testing._internal.common_distributed:Started process 6 with pid 2643 2023-01-11T21:14:10.5840390Z INFO:torch.testing._internal.common_distributed:Started process 7 with pid 2644 2023-01-11T21:14:10.5841050Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5841484Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5842058Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5842518Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5843082Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5843501Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5844064Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5844521Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5845070Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5845504Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5846059Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5846509Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5847054Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5847496Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5848054Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5848486Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5849055Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5849496Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5850050Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5850484Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5851049Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5851481Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5852037Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5852471Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5853105Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5853539Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5854079Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5854529Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5855091Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5855526Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5856066Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5856516Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5857382Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T21:14:10.5857836Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 4 2023-01-11T21:14:10.5858381Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:14:10.5858844Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 6 2023-01-11T21:14:10.5859300Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 7 2023-01-11T21:14:10.5859736Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T21:14:10.5860182Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 5 2023-01-11T21:14:10.5860632Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:14:10.5861015Z skip: Need at least 8 CUDA devices (2.616s) 2023-01-11T21:14:10.5861473Z test_broadcast_1d (__main__.DeviceMeshCollectiveTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2909 2023-01-11T21:14:10.5862000Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2910 2023-01-11T21:14:10.5862441Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 2911 2023-01-11T21:14:10.5862857Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 2912 2023-01-11T21:14:10.5863279Z INFO:torch.testing._internal.common_distributed:Started process 4 with pid 2913 2023-01-11T21:14:10.5863708Z INFO:torch.testing._internal.common_distributed:Started process 5 with pid 2914 2023-01-11T21:14:10.5864136Z INFO:torch.testing._internal.common_distributed:Started process 6 with pid 2915 2023-01-11T21:14:10.5864542Z INFO:torch.testing._internal.common_distributed:Started process 7 with pid 2916 2023-01-11T21:14:10.5865148Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5865594Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5866146Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5866610Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5867178Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5867619Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5868162Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5868621Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5869182Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5869597Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5870249Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5870706Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5871269Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5871685Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5872249Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5872700Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5873264Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5873678Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5874244Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5874753Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5875310Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5875740Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5876300Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5876753Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5877300Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5877738Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5878287Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5878710Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5879276Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5879735Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5880308Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5880745Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5881178Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T21:14:10.5881638Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:14:10.5882098Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 7 2023-01-11T21:14:10.5882539Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T21:14:10.5882992Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 4 2023-01-11T21:14:10.5883452Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:14:10.5883883Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 5 2023-01-11T21:14:10.5884334Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 6 2023-01-11T21:14:10.5884718Z skip: Need at least 8 CUDA devices (2.616s) 2023-01-11T21:14:10.5885193Z test_broadcast_nd (__main__.DeviceMeshCollectiveTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3181 2023-01-11T21:14:10.5885698Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3182 2023-01-11T21:14:10.5886134Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 3183 2023-01-11T21:14:10.5886642Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 3184 2023-01-11T21:14:10.5887066Z INFO:torch.testing._internal.common_distributed:Started process 4 with pid 3185 2023-01-11T21:14:10.5887497Z INFO:torch.testing._internal.common_distributed:Started process 5 with pid 3186 2023-01-11T21:14:10.5887923Z INFO:torch.testing._internal.common_distributed:Started process 6 with pid 3187 2023-01-11T21:14:10.5888352Z INFO:torch.testing._internal.common_distributed:Started process 7 with pid 3188 2023-01-11T21:14:10.5888936Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5889381Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5889943Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5890391Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5890961Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5891505Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5892089Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5892527Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5893088Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5893521Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5894082Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5894520Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5895092Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5895532Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5896075Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5896527Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5897526Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5897958Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5898498Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5898948Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5899516Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5899928Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5900491Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5900940Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5901499Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5901915Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5902478Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5902927Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5903487Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5904008Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5904580Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5905034Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5905443Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 7 2023-01-11T21:14:10.5905906Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T21:14:10.5906367Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:14:10.5906825Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T21:14:10.5907262Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 5 2023-01-11T21:14:10.5907717Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:14:10.5908175Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 4 2023-01-11T21:14:10.5908672Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 6 2023-01-11T21:14:10.5909067Z skip: Need at least 8 CUDA devices (2.513s) 2023-01-11T21:14:10.5909547Z test_reduce_scatter_1d (__main__.DeviceMeshCollectiveTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3453 2023-01-11T21:14:10.5910072Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3454 2023-01-11T21:14:10.5910491Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 3455 2023-01-11T21:14:10.5910918Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 3456 2023-01-11T21:14:10.5911598Z INFO:torch.testing._internal.common_distributed:Started process 4 with pid 3457 2023-01-11T21:14:10.5912015Z INFO:torch.testing._internal.common_distributed:Started process 5 with pid 3458 2023-01-11T21:14:10.5912444Z INFO:torch.testing._internal.common_distributed:Started process 6 with pid 3459 2023-01-11T21:14:10.5912874Z INFO:torch.testing._internal.common_distributed:Started process 7 with pid 3460 2023-01-11T21:14:10.5913477Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5913907Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5914473Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5914936Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5915506Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5915926Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5916493Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5916953Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5917504Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5917940Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5918497Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5918950Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5919495Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5919931Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5920594Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5921029Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5921601Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5922038Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5922597Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5923033Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5923599Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5924035Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5924592Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5925030Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5925642Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5926387Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5926943Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5927397Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5927957Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5928388Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5928934Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5929397Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5929835Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:14:10.5930279Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 6 2023-01-11T21:14:10.5930746Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:14:10.5931202Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T21:14:10.5931661Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 4 2023-01-11T21:14:10.5932095Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T21:14:10.5932555Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 5 2023-01-11T21:14:10.5933014Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 7 2023-01-11T21:14:10.5933400Z skip: Need at least 8 CUDA devices (2.514s) 2023-01-11T21:14:10.5933871Z test_reduce_scatter_nd (__main__.DeviceMeshCollectiveTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3725 2023-01-11T21:14:10.5934405Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3726 2023-01-11T21:14:10.5934843Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 3727 2023-01-11T21:14:10.5935253Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 3728 2023-01-11T21:14:10.5935682Z INFO:torch.testing._internal.common_distributed:Started process 4 with pid 3729 2023-01-11T21:14:10.5936112Z INFO:torch.testing._internal.common_distributed:Started process 5 with pid 3730 2023-01-11T21:14:10.5936801Z INFO:torch.testing._internal.common_distributed:Started process 6 with pid 3731 2023-01-11T21:14:10.5937235Z INFO:torch.testing._internal.common_distributed:Started process 7 with pid 3732 2023-01-11T21:14:10.5937951Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5938403Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5938949Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5939413Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5939984Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5940430Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5940971Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5941432Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5942000Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5942507Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5943075Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5943532Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5944100Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5944519Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5945082Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5945537Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5946108Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5946525Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5947091Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5947553Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5948124Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5948545Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5949113Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5949571Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5950116Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5950561Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5951129Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5951593Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5952147Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5952581Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5953142Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5953591Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5954008Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 5 2023-01-11T21:14:10.5954561Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 4 2023-01-11T21:14:10.5955017Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 7 2023-01-11T21:14:10.5955453Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:14:10.5955913Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T21:14:10.5956371Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 6 2023-01-11T21:14:10.5956825Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:14:10.5957256Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T21:14:10.5957638Z skip: Need at least 8 CUDA devices (2.513s) 2023-01-11T21:14:10.5958129Z test_reduce_scatter_uneven (__main__.DeviceMeshCollectiveTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3997 2023-01-11T21:14:10.5958665Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3998 2023-01-11T21:14:10.5959137Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 3999 2023-01-11T21:14:10.5959589Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 4000 2023-01-11T21:14:10.5960020Z INFO:torch.testing._internal.common_distributed:Started process 4 with pid 4001 2023-01-11T21:14:10.5960433Z INFO:torch.testing._internal.common_distributed:Started process 5 with pid 4002 2023-01-11T21:14:10.5960860Z INFO:torch.testing._internal.common_distributed:Started process 6 with pid 4003 2023-01-11T21:14:10.5961281Z INFO:torch.testing._internal.common_distributed:Started process 7 with pid 4004 2023-01-11T21:14:10.5961889Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5962319Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5962889Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5963350Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5963901Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5964351Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5964915Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5965370Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5965921Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5966350Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5966921Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5967374Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5967930Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5968362Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5968926Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5969372Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5969935Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5970371Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5970923Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5971400Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5971973Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5972493Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5973054Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5973513Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5974141Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5974582Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5975130Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5975591Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5976230Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5977001Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5977566Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5978021Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5978455Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 7 2023-01-11T21:14:10.5978899Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:14:10.5979363Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T21:14:10.5979829Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 6 2023-01-11T21:14:10.5980286Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 4 2023-01-11T21:14:10.5980727Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T21:14:10.5981187Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 5 2023-01-11T21:14:10.5981634Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:14:10.5982015Z skip: Need at least 8 CUDA devices (2.514s) 2023-01-11T21:14:10.5982471Z test_scatter_1d (__main__.DeviceMeshCollectiveTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 4269 2023-01-11T21:14:10.5982987Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 4270 2023-01-11T21:14:10.5983427Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 4271 2023-01-11T21:14:10.5983841Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 4272 2023-01-11T21:14:10.5984266Z INFO:torch.testing._internal.common_distributed:Started process 4 with pid 4273 2023-01-11T21:14:10.5984695Z INFO:torch.testing._internal.common_distributed:Started process 5 with pid 4274 2023-01-11T21:14:10.5985113Z INFO:torch.testing._internal.common_distributed:Started process 6 with pid 4275 2023-01-11T21:14:10.5985534Z INFO:torch.testing._internal.common_distributed:Started process 7 with pid 4276 2023-01-11T21:14:10.5986137Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5986579Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5987131Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5987603Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5988283Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5988721Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5989254Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5989699Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5990262Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5990722Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5991284Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5991787Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5992357Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5992784Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5993416Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5993892Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5994462Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5994879Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5995444Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5995903Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5996473Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5996900Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5997470Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5997931Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.5998477Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.5998917Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.5999480Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.5999936Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6000493Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6000932Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6001499Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6001958Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6002369Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 7 2023-01-11T21:14:10.6002836Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T21:14:10.6003299Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:14:10.6003731Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T21:14:10.6004181Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 5 2023-01-11T21:14:10.6004625Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:14:10.6005139Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 4 2023-01-11T21:14:10.6005584Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 6 2023-01-11T21:14:10.6005962Z skip: Need at least 8 CUDA devices (2.514s) 2023-01-11T21:14:10.6006435Z test_scatter_nd (__main__.DeviceMeshCollectiveTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 4541 2023-01-11T21:14:10.6006943Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 4542 2023-01-11T21:14:10.6007386Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 4543 2023-01-11T21:14:10.6007818Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 4544 2023-01-11T21:14:10.6008252Z INFO:torch.testing._internal.common_distributed:Started process 4 with pid 4545 2023-01-11T21:14:10.6008658Z INFO:torch.testing._internal.common_distributed:Started process 5 with pid 4546 2023-01-11T21:14:10.6009086Z INFO:torch.testing._internal.common_distributed:Started process 6 with pid 4547 2023-01-11T21:14:10.6009576Z INFO:torch.testing._internal.common_distributed:Started process 7 with pid 4548 2023-01-11T21:14:10.6010169Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6010645Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6011215Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6011676Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6012221Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6012666Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6013232Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6013689Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6014241Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6014677Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6015239Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6015677Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6016247Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6016945Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6017519Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6017965Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6018541Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6018979Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6019537Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6019975Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6020538Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6020974Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6021517Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6022072Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6022653Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6023098Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6023648Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6024110Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6024678Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6025116Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6025661Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6026123Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6026560Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:14:10.6027077Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T21:14:10.6027552Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 6 2023-01-11T21:14:10.6028011Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T21:14:10.6028456Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 4 2023-01-11T21:14:10.6028900Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 7 2023-01-11T21:14:10.6029354Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 5 2023-01-11T21:14:10.6029797Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:14:10.6030179Z skip: Need at least 8 CUDA devices (2.514s) 2023-01-11T21:14:10.6030664Z test_scatter_uneven (__main__.DeviceMeshCollectiveTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 4813 2023-01-11T21:14:10.6031190Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 4814 2023-01-11T21:14:10.6031637Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 4815 2023-01-11T21:14:10.6032053Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 4816 2023-01-11T21:14:10.6032484Z INFO:torch.testing._internal.common_distributed:Started process 4 with pid 4817 2023-01-11T21:14:10.6032931Z INFO:torch.testing._internal.common_distributed:Started process 5 with pid 4818 2023-01-11T21:14:10.6033358Z INFO:torch.testing._internal.common_distributed:Started process 6 with pid 4819 2023-01-11T21:14:10.6033777Z INFO:torch.testing._internal.common_distributed:Started process 7 with pid 4820 2023-01-11T21:14:10.6034381Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6034832Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6035379Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6035846Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6036426Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6036869Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6037418Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6037879Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6038447Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6038966Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6039526Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6039987Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6040555Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6040976Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6041559Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6042015Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6042584Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6043010Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6043624Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6044085Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6044635Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6045075Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6045637Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6046090Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6046641Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6047081Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6047648Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6048104Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6048654Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6049096Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6049658Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6050098Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6050531Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 6 2023-01-11T21:14:10.6050997Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 5 2023-01-11T21:14:10.6051465Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 7 2023-01-11T21:14:10.6051915Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 4 2023-01-11T21:14:10.6052372Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T21:14:10.6052829Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:14:10.6053284Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T21:14:10.6053720Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:14:10.6054098Z skip: Need at least 8 CUDA devices (2.514s) 2023-01-11T21:14:10.6054550Z test_device_mesh_2d (__main__.DeviceMeshTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5085 2023-01-11T21:14:10.6055102Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5086 2023-01-11T21:14:10.6055545Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 5087 2023-01-11T21:14:10.6055986Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 5088 2023-01-11T21:14:10.6056418Z INFO:torch.testing._internal.common_distributed:Started process 4 with pid 5089 2023-01-11T21:14:10.6057012Z INFO:torch.testing._internal.common_distributed:Started process 5 with pid 5090 2023-01-11T21:14:10.6057442Z INFO:torch.testing._internal.common_distributed:Started process 6 with pid 5091 2023-01-11T21:14:10.6057870Z INFO:torch.testing._internal.common_distributed:Started process 7 with pid 5092 2023-01-11T21:14:10.6058461Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6058911Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6059479Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6059952Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6060583Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6061038Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6061610Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6062072Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6122738Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6123303Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6123951Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6124440Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6125057Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6125523Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6126120Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6126595Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6127208Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6127673Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6128258Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6128746Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6129346Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6129806Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6130384Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6130871Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6131473Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6131936Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6132506Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6132988Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6133766Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6134228Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6134803Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6135280Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6135724Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 5 2023-01-11T21:14:10.6136197Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T21:14:10.6136890Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:14:10.6137389Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 7 2023-01-11T21:14:10.6137875Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T21:14:10.6138342Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:14:10.6138920Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 4 2023-01-11T21:14:10.6139424Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 6 2023-01-11T21:14:10.6139810Z skip: Need at least 8 CUDA devices (2.514s) 2023-01-11T21:14:10.6140306Z test_device_mesh_2d_from_dim_groups (__main__.DeviceMeshTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5357 2023-01-11T21:14:10.6140852Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5358 2023-01-11T21:14:10.6141308Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 5359 2023-01-11T21:14:10.6141747Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 5360 2023-01-11T21:14:10.6142208Z INFO:torch.testing._internal.common_distributed:Started process 4 with pid 5361 2023-01-11T21:14:10.6142656Z INFO:torch.testing._internal.common_distributed:Started process 5 with pid 5362 2023-01-11T21:14:10.6143096Z INFO:torch.testing._internal.common_distributed:Started process 6 with pid 5363 2023-01-11T21:14:10.6143553Z INFO:torch.testing._internal.common_distributed:Started process 7 with pid 5364 2023-01-11T21:14:10.6144190Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6144652Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6145231Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6145710Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6146306Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6146768Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6147348Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6147833Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6148428Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6148876Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6149466Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6149949Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6150548Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6151095Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6151695Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6152179Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6152760Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6153223Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6153811Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6154297Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6154875Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6155338Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6155924Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6156455Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6157048Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6157506Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6158094Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6158562Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6159155Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6159614Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6160211Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6160685Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6161135Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 4 2023-01-11T21:14:10.6161615Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:14:10.6162089Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T21:14:10.6162576Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T21:14:10.6163054Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 6 2023-01-11T21:14:10.6163530Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:14:10.6164007Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 5 2023-01-11T21:14:10.6164487Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 7 2023-01-11T21:14:10.6164886Z skip: Need at least 8 CUDA devices (2.514s) 2023-01-11T21:14:10.6165368Z test_device_mesh_dim_groups_error (__main__.DeviceMeshTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5629 2023-01-11T21:14:10.6165897Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5630 2023-01-11T21:14:10.6166361Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 5631 2023-01-11T21:14:10.6166824Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 5632 2023-01-11T21:14:10.6167272Z INFO:torch.testing._internal.common_distributed:Started process 4 with pid 5633 2023-01-11T21:14:10.6167734Z INFO:torch.testing._internal.common_distributed:Started process 5 with pid 5634 2023-01-11T21:14:10.6168264Z INFO:torch.testing._internal.common_distributed:Started process 6 with pid 5635 2023-01-11T21:14:10.6168723Z INFO:torch.testing._internal.common_distributed:Started process 7 with pid 5636 2023-01-11T21:14:10.6169348Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6169802Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6170387Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6170849Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6171442Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6171896Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6172472Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6172944Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6173593Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6174065Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6174647Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6175130Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6175724Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6176186Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6177440Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6177932Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6178531Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6178985Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6179558Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6180030Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6180615Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6181061Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6181647Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6182118Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6182713Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6183159Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6183746Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6184219Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6184800Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6185244Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6185825Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6186302Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6186866Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 5 2023-01-11T21:14:10.6187364Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T21:14:10.6187855Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 6 2023-01-11T21:14:10.6188341Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T21:14:10.6188809Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 4 2023-01-11T21:14:10.6189285Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:14:10.6189769Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:14:10.6190241Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 7 2023-01-11T21:14:10.6190644Z skip: Need at least 8 CUDA devices (2.514s) 2023-01-11T21:14:10.6191122Z test_device_mesh_nd (__main__.DeviceMeshTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5901 2023-01-11T21:14:10.6191776Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5902 2023-01-11T21:14:10.6192237Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 5903 2023-01-11T21:14:10.6192696Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 5904 2023-01-11T21:14:10.6193159Z INFO:torch.testing._internal.common_distributed:Started process 4 with pid 5905 2023-01-11T21:14:10.6193606Z INFO:torch.testing._internal.common_distributed:Started process 5 with pid 5906 2023-01-11T21:14:10.6194058Z INFO:torch.testing._internal.common_distributed:Started process 6 with pid 5907 2023-01-11T21:14:10.6194512Z INFO:torch.testing._internal.common_distributed:Started process 7 with pid 5908 2023-01-11T21:14:10.6195143Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6195595Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6196166Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6196615Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6197184Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6197647Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6198243Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6198731Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6199322Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6199807Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6200414Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6200895Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6201478Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6201940Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6202527Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6202995Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6203592Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6204145Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6204738Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6205213Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6205807Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6206273Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6206841Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6207328Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6207917Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6208375Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6208952Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6209484Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6210093Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:14:10.6210547Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:14:10.6211119Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:14:10.6211588Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:14:10.6212033Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:14:10.6212510Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T21:14:10.6213000Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 6 2023-01-11T21:14:10.6213483Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:14:10.6213969Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 5 2023-01-11T21:14:10.6214435Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T21:14:10.6214912Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 7 2023-01-11T21:14:10.6215393Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 4 2023-01-11T21:14:10.6215781Z skip: Need at least 8 CUDA devices (2.514s) 2023-01-11T21:14:10.6215979Z 2023-01-11T21:14:10.6216264Z ---------------------------------------------------------------------- 2023-01-11T21:14:10.6216808Z Ran 19 tests in 49.819s 2023-01-11T21:14:10.6216989Z 2023-01-11T21:14:10.6217100Z OK (skipped=19) 2023-01-11T21:14:10.6217253Z 2023-01-11T21:14:10.6217379Z Generating XML reports... 2023-01-11T21:14:10.6218033Z Generated XML report: test-reports/python-unittest/distributed._tensor.test_device_mesh/TEST-DeviceMeshCollectiveTest-20230111211320.xml 2023-01-11T21:14:10.6218823Z Generated XML report: test-reports/python-unittest/distributed._tensor.test_device_mesh/TEST-DeviceMeshTest-20230111211320.xml 2023-01-11T21:14:10.6219172Z 2023-01-11T21:14:10.6219544Z ##[endgroup] 2023-01-11T21:14:10.6220170Z FINISHED PRINTING LOG FILE of distributed/_tensor/test_device_mesh (/var/lib/jenkins/workspace/test/test-reports/distributed-_tensor-test_device_mesh_px3qazu8) 2023-01-11T21:14:10.6220545Z 2023-01-11T21:14:10.6220829Z Running distributed/fsdp/test_fsdp_optim_state ... [2023-01-11 21:14:10.567610] 2023-01-11T21:14:10.6221548Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_optim_state.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:14:10.567894] 2023-01-11T21:18:34.9291149Z 2023-01-11T21:18:34.9291650Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_optim_state 2023-01-11T21:18:34.9294432Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_optim_state (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_optim_state_hs0nwjoo) 2023-01-11T21:18:34.9294818Z 2023-01-11T21:18:34.9294932Z Running tests... 2023-01-11T21:18:34.9297073Z ---------------------------------------------------------------------- 2023-01-11T21:18:34.9297923Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_optim_state 2023-01-11T21:18:34.9298863Z test_compatible_with_named_optimizer (__main__.TestFSDPOptimState) ... skip: The test currently fails on CI. (0.002s) 2023-01-11T21:18:34.9299706Z test_flatten_sharded_optim_state_dict_nested (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9300760Z Tests :meth:`flatten_sharded_optim_state_dict` for an FSDP-root ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:18:34.9301274Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6208 2023-01-11T21:18:34.9301733Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6209 2023-01-11T21:18:34.9302639Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9303100Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9303682Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9304148Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9304726Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9305152Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9305718Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9306196Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9306656Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9307511Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9308178Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9308898Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9309406Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9309855Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9310713Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2533: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2023-01-11T21:18:34.9311268Z warnings.warn( 2023-01-11T21:18:34.9312015Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2533: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2023-01-11T21:18:34.9312529Z warnings.warn( 2023-01-11T21:18:34.9312778Z dist init r=0, world=2 2023-01-11T21:18:34.9313029Z dist init r=1, world=2 2023-01-11T21:18:34.9313261Z ok (6.519s) 2023-01-11T21:18:34.9313602Z test_flatten_sharded_optim_state_dict_transformer (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9314276Z Tests :meth:`flatten_sharded_optim_state_dict` for an FSDP-root ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6291 2023-01-11T21:18:34.9314954Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6292 2023-01-11T21:18:34.9315539Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9315995Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9316935Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9317865Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9318731Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9319207Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9319767Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9320229Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9320690Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9321343Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9322236Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9322929Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9323441Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9323897Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9324751Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2533: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2023-01-11T21:18:34.9325299Z warnings.warn( 2023-01-11T21:18:34.9326056Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2533: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2023-01-11T21:18:34.9326585Z warnings.warn( 2023-01-11T21:18:34.9326819Z dist init r=0, world=2 2023-01-11T21:18:34.9327070Z dist init r=1, world=2 2023-01-11T21:18:34.9327310Z ok (5.412s) 2023-01-11T21:18:34.9327603Z test_full_optim_state_dict_keys (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9328094Z Tests that the parameter keys returned by ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6374 2023-01-11T21:18:34.9328595Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6375 2023-01-11T21:18:34.9329183Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9329662Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9330237Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9330707Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9331266Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9331791Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9332638Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9333107Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9333635Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9334422Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9335091Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9335755Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9336266Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9337311Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9337671Z dist init r=0, world=2 2023-01-11T21:18:34.9338102Z dist init r=1, world=2 2023-01-11T21:18:34.9338481Z ok (4.713s) 2023-01-11T21:18:34.9338814Z test_full_optim_state_dict_nested_invalid (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9339313Z Tests that :meth:`full_optim_state_dict` raises an error when ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6457 2023-01-11T21:18:34.9339826Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6458 2023-01-11T21:18:34.9340547Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9341013Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9341571Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9342034Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9342605Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9343027Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9343726Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9344406Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9344856Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9345327Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9345980Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9346662Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9347173Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9347618Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9347962Z dist init r=1, world=2 2023-01-11T21:18:34.9348215Z dist init r=0, world=2 2023-01-11T21:18:34.9348438Z ok (4.712s) 2023-01-11T21:18:34.9348739Z test_optim_input_warning (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9349245Z Tests that passing the ``optim_input`` argument into optimizer state ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6540 2023-01-11T21:18:34.9349751Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6541 2023-01-11T21:18:34.9350353Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9350795Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9351358Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9351801Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9352368Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9352910Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9353463Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9353919Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9354364Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9354850Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9355480Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9356158Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9356667Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9357133Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9357466Z dist init r=0, world=2 2023-01-11T21:18:34.9357767Z dist init r=1, world=2 2023-01-11T21:18:34.9358013Z ok (4.712s) 2023-01-11T21:18:34.9358473Z test_optim_state_dict_nested_state_dict_type_StateDictType_FULL_STATE_DICT_use_multiple_param_groups_False_rank0_only_False_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9359129Z Tests :meth:`full_optim_state_dict` and meth:`sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6623 2023-01-11T21:18:34.9359648Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6624 2023-01-11T21:18:34.9360244Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9360670Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9361238Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9361705Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9362257Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9362694Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9363253Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9363712Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9364139Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9364627Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9365276Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9365952Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9366446Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9366910Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9367801Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9368356Z warnings.warn( 2023-01-11T21:18:34.9369121Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9369730Z warnings.warn( 2023-01-11T21:18:34.9369978Z dist init r=0, world=2 2023-01-11T21:18:34.9370206Z dist init r=1, world=2 2023-01-11T21:18:34.9370438Z ok (4.812s) 2023-01-11T21:18:34.9370911Z test_optim_state_dict_nested_state_dict_type_StateDictType_FULL_STATE_DICT_use_multiple_param_groups_False_rank0_only_False_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9371568Z Tests :meth:`full_optim_state_dict` and meth:`sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6706 2023-01-11T21:18:34.9372075Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6707 2023-01-11T21:18:34.9372675Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9373121Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9373686Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9374180Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9374758Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9375197Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9375742Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9376202Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9377008Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9377505Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9378151Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9378829Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9379342Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9379803Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9380680Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9381231Z warnings.warn( 2023-01-11T21:18:34.9382015Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9382555Z warnings.warn( 2023-01-11T21:18:34.9382784Z dist init r=0, world=2 2023-01-11T21:18:34.9383031Z dist init r=1, world=2 2023-01-11T21:18:34.9383266Z ok (4.812s) 2023-01-11T21:18:34.9383718Z test_optim_state_dict_nested_state_dict_type_StateDictType_FULL_STATE_DICT_use_multiple_param_groups_False_rank0_only_True_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9384369Z Tests :meth:`full_optim_state_dict` and meth:`sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6789 2023-01-11T21:18:34.9384895Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6790 2023-01-11T21:18:34.9385493Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9386016Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9386595Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9387057Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9387607Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9388042Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9388603Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9389056Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9389484Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9389968Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9390621Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9391372Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9391880Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9392344Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9393237Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9393787Z warnings.warn( 2023-01-11T21:18:34.9394548Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9395100Z warnings.warn( 2023-01-11T21:18:34.9395349Z dist init r=1, world=2 2023-01-11T21:18:34.9395580Z dist init r=0, world=2 2023-01-11T21:18:34.9395818Z ok (4.912s) 2023-01-11T21:18:34.9396293Z test_optim_state_dict_nested_state_dict_type_StateDictType_FULL_STATE_DICT_use_multiple_param_groups_False_rank0_only_True_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9396951Z Tests :meth:`full_optim_state_dict` and meth:`sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6872 2023-01-11T21:18:34.9397460Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6873 2023-01-11T21:18:34.9398057Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9398499Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9399052Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9399526Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9400096Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9400584Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9401142Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9401607Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9402060Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9402626Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9403259Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9403940Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9404456Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9404928Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9405827Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9406382Z warnings.warn( 2023-01-11T21:18:34.9407227Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9407782Z warnings.warn( 2023-01-11T21:18:34.9408014Z dist init r=1, world=2 2023-01-11T21:18:34.9408264Z dist init r=0, world=2 2023-01-11T21:18:34.9408506Z ok (4.912s) 2023-01-11T21:18:34.9408968Z test_optim_state_dict_nested_state_dict_type_StateDictType_FULL_STATE_DICT_use_multiple_param_groups_True_rank0_only_False_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9409645Z Tests :meth:`full_optim_state_dict` and meth:`sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6955 2023-01-11T21:18:34.9410180Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6956 2023-01-11T21:18:34.9410787Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9411236Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9411788Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9412244Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9412814Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9413250Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9413796Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9414252Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9414697Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9415172Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9415819Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9416503Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9417407Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9417860Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9418755Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9419334Z warnings.warn( 2023-01-11T21:18:34.9420252Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9420788Z warnings.warn( 2023-01-11T21:18:34.9421040Z dist init r=1, world=2 2023-01-11T21:18:34.9421300Z dist init r=0, world=2 2023-01-11T21:18:34.9421544Z ok (4.912s) 2023-01-11T21:18:34.9422010Z test_optim_state_dict_nested_state_dict_type_StateDictType_FULL_STATE_DICT_use_multiple_param_groups_True_rank0_only_False_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9422673Z Tests :meth:`full_optim_state_dict` and meth:`sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 7038 2023-01-11T21:18:34.9423202Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 7039 2023-01-11T21:18:34.9423810Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9424236Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9424873Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9425347Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9425926Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9426345Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9426910Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9427369Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9427798Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9428294Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9428948Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9429629Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9430140Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9430614Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9431518Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9432073Z warnings.warn( 2023-01-11T21:18:34.9432862Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9433384Z warnings.warn( 2023-01-11T21:18:34.9433628Z dist init r=0, world=2 2023-01-11T21:18:34.9433875Z dist init r=1, world=2 2023-01-11T21:18:34.9434092Z ok (4.912s) 2023-01-11T21:18:34.9434562Z test_optim_state_dict_nested_state_dict_type_StateDictType_FULL_STATE_DICT_use_multiple_param_groups_True_rank0_only_True_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9435213Z Tests :meth:`full_optim_state_dict` and meth:`sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 7121 2023-01-11T21:18:34.9435732Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 7122 2023-01-11T21:18:34.9436376Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9436833Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9437399Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9437858Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9440132Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9440620Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9441246Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9441726Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9442161Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9442667Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9443407Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9444112Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9444615Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9445084Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9445985Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9446548Z warnings.warn( 2023-01-11T21:18:34.9447322Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9447878Z warnings.warn( 2023-01-11T21:18:34.9448127Z dist init r=1, world=2 2023-01-11T21:18:34.9448359Z dist init r=0, world=2 2023-01-11T21:18:34.9448596Z ok (4.812s) 2023-01-11T21:18:34.9449491Z test_optim_state_dict_nested_state_dict_type_StateDictType_FULL_STATE_DICT_use_multiple_param_groups_True_rank0_only_True_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9450157Z Tests :meth:`full_optim_state_dict` and meth:`sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 7204 2023-01-11T21:18:34.9450671Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 7205 2023-01-11T21:18:34.9451291Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9451744Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9452316Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9452764Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9453338Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9453783Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9454331Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9454793Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9455318Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9455821Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9456463Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9457589Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9458118Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9458593Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9459472Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9460036Z warnings.warn( 2023-01-11T21:18:34.9460922Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9461490Z warnings.warn( 2023-01-11T21:18:34.9461721Z dist init r=1, world=2 2023-01-11T21:18:34.9461971Z dist init r=0, world=2 2023-01-11T21:18:34.9462209Z ok (4.912s) 2023-01-11T21:18:34.9462672Z test_optim_state_dict_nested_state_dict_type_StateDictType_SHARDED_STATE_DICT_use_multiple_param_groups_False_rank0_only_False_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9463339Z Tests :meth:`full_optim_state_dict` and meth:`sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 7287 2023-01-11T21:18:34.9463866Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 7288 2023-01-11T21:18:34.9464478Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9464912Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9465485Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9465950Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9466527Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9466952Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9467519Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9467979Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9468414Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9468908Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9469562Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9470244Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9470747Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9471215Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9472077Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2533: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2023-01-11T21:18:34.9472706Z warnings.warn( 2023-01-11T21:18:34.9473443Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2533: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2023-01-11T21:18:34.9473981Z warnings.warn( 2023-01-11T21:18:34.9474233Z dist init r=0, world=2 2023-01-11T21:18:34.9474466Z dist init r=1, world=2 2023-01-11T21:18:34.9474702Z ok (4.913s) 2023-01-11T21:18:34.9475180Z test_optim_state_dict_nested_state_dict_type_StateDictType_SHARDED_STATE_DICT_use_multiple_param_groups_False_rank0_only_False_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9475834Z Tests :meth:`full_optim_state_dict` and meth:`sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 7370 2023-01-11T21:18:34.9476341Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 7371 2023-01-11T21:18:34.9476949Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9477442Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9478030Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9478481Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9479062Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9479501Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9480047Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9480510Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9480964Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9481460Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9482098Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9482778Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9483298Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9483772Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9484613Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2533: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2023-01-11T21:18:34.9485161Z warnings.warn( 2023-01-11T21:18:34.9485915Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2533: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2023-01-11T21:18:34.9486452Z warnings.warn( 2023-01-11T21:18:34.9486685Z dist init r=0, world=2 2023-01-11T21:18:34.9486934Z dist init r=1, world=2 2023-01-11T21:18:34.9487170Z ok (4.912s) 2023-01-11T21:18:34.9487631Z test_optim_state_dict_nested_state_dict_type_StateDictType_SHARDED_STATE_DICT_use_multiple_param_groups_False_rank0_only_True_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9488291Z Tests :meth:`full_optim_state_dict` and meth:`sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 7453 2023-01-11T21:18:34.9488816Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 7454 2023-01-11T21:18:34.9489489Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9489921Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9490491Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9490957Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9491511Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9491953Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9492520Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9492977Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9493414Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9493960Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9494633Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9495298Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9495820Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9496294Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9496996Z dist init r=1, world=2 2023-01-11T21:18:34.9497232Z dist init r=0, world=2 2023-01-11T21:18:34.9497475Z ok (4.211s) 2023-01-11T21:18:34.9497962Z test_optim_state_dict_nested_state_dict_type_StateDictType_SHARDED_STATE_DICT_use_multiple_param_groups_False_rank0_only_True_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9498614Z Tests :meth:`full_optim_state_dict` and meth:`sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 7532 2023-01-11T21:18:34.9499145Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 7533 2023-01-11T21:18:34.9499759Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9500209Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9500809Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9501283Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9501860Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9502307Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9502858Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9503322Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9503768Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9504245Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9504894Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9505578Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9506092Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9506638Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9506983Z dist init r=0, world=2 2023-01-11T21:18:34.9507239Z dist init r=1, world=2 2023-01-11T21:18:34.9507460Z ok (4.211s) 2023-01-11T21:18:34.9507941Z test_optim_state_dict_nested_state_dict_type_StateDictType_SHARDED_STATE_DICT_use_multiple_param_groups_True_rank0_only_False_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9508604Z Tests :meth:`full_optim_state_dict` and meth:`sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 7611 2023-01-11T21:18:34.9509129Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 7612 2023-01-11T21:18:34.9509719Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9510166Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9510744Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9511280Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9511851Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9512297Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9512864Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9513307Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9513753Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9514243Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9514899Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9515569Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9516086Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9516549Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9517413Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2533: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2023-01-11T21:18:34.9517937Z warnings.warn( 2023-01-11T21:18:34.9518683Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2533: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2023-01-11T21:18:34.9519224Z warnings.warn( 2023-01-11T21:18:34.9519475Z dist init r=0, world=2 2023-01-11T21:18:34.9519708Z dist init r=1, world=2 2023-01-11T21:18:34.9519943Z ok (4.912s) 2023-01-11T21:18:34.9520418Z test_optim_state_dict_nested_state_dict_type_StateDictType_SHARDED_STATE_DICT_use_multiple_param_groups_True_rank0_only_False_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9521062Z Tests :meth:`full_optim_state_dict` and meth:`sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 7694 2023-01-11T21:18:34.9521587Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 7695 2023-01-11T21:18:34.9522193Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9522645Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9523289Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9523763Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9524337Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9524762Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9525323Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9525777Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9526226Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9526698Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9527358Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9528089Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9528620Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9529069Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9529926Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2533: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2023-01-11T21:18:34.9530464Z warnings.warn( 2023-01-11T21:18:34.9531211Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2533: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2023-01-11T21:18:34.9531735Z warnings.warn( 2023-01-11T21:18:34.9531985Z dist init r=1, world=2 2023-01-11T21:18:34.9532241Z dist init r=0, world=2 2023-01-11T21:18:34.9532459Z ok (4.913s) 2023-01-11T21:18:34.9532937Z test_optim_state_dict_nested_state_dict_type_StateDictType_SHARDED_STATE_DICT_use_multiple_param_groups_True_rank0_only_True_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9533598Z Tests :meth:`full_optim_state_dict` and meth:`sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 7777 2023-01-11T21:18:34.9534121Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 7778 2023-01-11T21:18:34.9534706Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9535150Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9535726Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9536176Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9537300Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9537755Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9538333Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9538777Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9539235Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9539724Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9540486Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9541158Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9541679Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9542145Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9542482Z dist init r=0, world=2 2023-01-11T21:18:34.9542735Z dist init r=1, world=2 2023-01-11T21:18:34.9542971Z ok (4.212s) 2023-01-11T21:18:34.9543432Z test_optim_state_dict_nested_state_dict_type_StateDictType_SHARDED_STATE_DICT_use_multiple_param_groups_True_rank0_only_True_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9544089Z Tests :meth:`full_optim_state_dict` and meth:`sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 7856 2023-01-11T21:18:34.9544618Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 7857 2023-01-11T21:18:34.9545290Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9545731Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9546308Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9546778Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9547357Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9547781Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9548346Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9548807Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9549259Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9549733Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9550383Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9551064Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9551560Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9552028Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9552379Z dist init r=1, world=2 2023-01-11T21:18:34.9552630Z dist init r=0, world=2 2023-01-11T21:18:34.9552858Z ok (4.211s) 2023-01-11T21:18:34.9553287Z test_rekey_optim_state_dict_to_ids_state_dict_type_StateDictType_FULL_STATE_DICT_use_multiple_param_groups_False (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9553892Z Tests :meth:`rekey_optim_state_dict` with the new keys being ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 7935 2023-01-11T21:18:34.9554389Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 7936 2023-01-11T21:18:34.9554997Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9555447Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9556025Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9556471Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9557039Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9557544Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9558100Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9558567Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9559016Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9559510Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9560146Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9560829Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9561351Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9561819Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9562751Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9563324Z warnings.warn( 2023-01-11T21:18:34.9564113Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9564661Z warnings.warn( 2023-01-11T21:18:34.9564889Z dist init r=0, world=2 2023-01-11T21:18:34.9565142Z dist init r=1, world=2 2023-01-11T21:18:34.9565383Z ok (4.912s) 2023-01-11T21:18:34.9565795Z test_rekey_optim_state_dict_to_ids_state_dict_type_StateDictType_FULL_STATE_DICT_use_multiple_param_groups_True (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9566400Z Tests :meth:`rekey_optim_state_dict` with the new keys being ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8018 2023-01-11T21:18:34.9566915Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8019 2023-01-11T21:18:34.9567522Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9567953Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9568525Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9568999Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9569552Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9570000Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9570572Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9571030Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9571462Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9571953Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9572599Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9573279Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9573863Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9574332Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9575238Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9575797Z warnings.warn( 2023-01-11T21:18:34.9576781Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9577353Z warnings.warn( 2023-01-11T21:18:34.9577604Z dist init r=1, world=2 2023-01-11T21:18:34.9577835Z dist init r=0, world=2 2023-01-11T21:18:34.9578080Z ok (5.013s) 2023-01-11T21:18:34.9578512Z test_rekey_optim_state_dict_to_ids_state_dict_type_StateDictType_SHARDED_STATE_DICT_use_multiple_param_groups_False (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9579202Z Tests :meth:`rekey_optim_state_dict` with the new keys being ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8101 2023-01-11T21:18:34.9579714Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8102 2023-01-11T21:18:34.9580335Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9580786Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9581339Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9581805Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9582374Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9582819Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9583371Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9583832Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9584278Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9584771Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9585407Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9586091Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9586613Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9587063Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9587924Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2533: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2023-01-11T21:18:34.9588475Z warnings.warn( 2023-01-11T21:18:34.9589224Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2533: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2023-01-11T21:18:34.9589762Z warnings.warn( 2023-01-11T21:18:34.9589993Z dist init r=0, world=2 2023-01-11T21:18:34.9590242Z dist init r=1, world=2 2023-01-11T21:18:34.9590474Z ok (4.712s) 2023-01-11T21:18:34.9590884Z test_rekey_optim_state_dict_to_ids_state_dict_type_StateDictType_SHARDED_STATE_DICT_use_multiple_param_groups_True (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9591573Z Tests :meth:`rekey_optim_state_dict` with the new keys being ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8184 2023-01-11T21:18:34.9592089Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8185 2023-01-11T21:18:34.9592676Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9593124Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9593698Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9594160Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9594712Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9595157Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9595767Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9596236Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9596670Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9597166Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9597819Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9598486Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9599002Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9599474Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9600336Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2533: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2023-01-11T21:18:34.9600918Z warnings.warn( 2023-01-11T21:18:34.9601670Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2533: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2023-01-11T21:18:34.9602210Z warnings.warn( 2023-01-11T21:18:34.9602463Z dist init r=0, world=2 2023-01-11T21:18:34.9602696Z dist init r=1, world=2 2023-01-11T21:18:34.9602934Z ok (4.812s) 2023-01-11T21:18:34.9603251Z test_rekey_optim_state_dict_to_names (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9603744Z Tests :meth:`rekey_optim_state_dict` with the new keys being ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8267 2023-01-11T21:18:34.9604264Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8268 2023-01-11T21:18:34.9604872Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9605322Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9605876Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9606342Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9606915Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9607338Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9607976Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9608440Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9608898Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9609374Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9610025Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9610708Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9611225Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9611672Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9612612Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9613184Z warnings.warn( 2023-01-11T21:18:34.9613981Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9614511Z warnings.warn( 2023-01-11T21:18:34.9614764Z dist init r=0, world=2 2023-01-11T21:18:34.9615013Z dist init r=1, world=2 2023-01-11T21:18:34.9615231Z ok (5.012s) 2023-01-11T21:18:34.9615619Z test_save_load_without_0th_param_state_state_dict_type_StateDictType_FULL_STATE_DICT (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9616200Z Tests saving and loading an optim state dict for Adam optimizer (i.e. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8350 2023-01-11T21:18:34.9617001Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8351 2023-01-11T21:18:34.9617599Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9618050Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9618624Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9619072Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9619645Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9620089Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9620660Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9621110Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9621565Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9622061Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9622691Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9623378Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9623896Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9624367Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9624796Z dist init r=0, world=2 2023-01-11T21:18:34.9625048Z dist init r=1, world=2 2023-01-11T21:18:34.9625288Z ok (4.612s) 2023-01-11T21:18:34.9625673Z test_save_load_without_0th_param_state_state_dict_type_StateDictType_SHARDED_STATE_DICT (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9626264Z Tests saving and loading an optim state dict for Adam optimizer (i.e. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8433 2023-01-11T21:18:34.9626792Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8434 2023-01-11T21:18:34.9627401Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9627832Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9628404Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9628873Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9629452Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9629940Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9630521Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9630987Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9631423Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9631916Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9632567Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9633251Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9633755Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9634230Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9635093Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2533: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2023-01-11T21:18:34.9635640Z warnings.warn( 2023-01-11T21:18:34.9636372Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2533: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2023-01-11T21:18:34.9636908Z warnings.warn( 2023-01-11T21:18:34.9637156Z dist init r=0, world=2 2023-01-11T21:18:34.9637390Z dist init r=1, world=2 2023-01-11T21:18:34.9637631Z ok (4.713s) 2023-01-11T21:18:34.9637981Z test_scatter_full_optim_state_dict_nested_halve_world_size (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9638667Z Tests :meth:`scatter_full_optim_state_dict` for a non-FSDP-root ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8516 2023-01-11T21:18:34.9639179Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8517 2023-01-11T21:18:34.9639781Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9640226Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9640779Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9641244Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9641819Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9642340Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9642894Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9643359Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9643809Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9644309Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9644944Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9645632Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9646150Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9646601Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9647132Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:18:34.9647635Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:18:34.9648293Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:18:34.9649093Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T21:18:34.9649576Z warnings.warn( 2023-01-11T21:18:34.9650109Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:18:34.9651061Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9651604Z warnings.warn( 2023-01-11T21:18:34.9652384Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9652942Z warnings.warn( 2023-01-11T21:18:34.9653321Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T21:18:34.9653794Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T21:18:34.9654443Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:18:34.9655127Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:18:34.9655535Z dist init r=0, world=2 2023-01-11T21:18:34.9655768Z dist init r=1, world=2 2023-01-11T21:18:34.9656008Z ok (5.112s) 2023-01-11T21:18:34.9656443Z test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_False_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9657468Z Tests :meth:`scatter_full_optim_state_dict` for a non-FSDP-root ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8609 2023-01-11T21:18:34.9658004Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8610 2023-01-11T21:18:34.9658604Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9659051Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9659699Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9660166Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9660744Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9661165Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9661733Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9662194Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9662645Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9663119Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9663774Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9664529Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9665062Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9665511Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9666411Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9666971Z warnings.warn( 2023-01-11T21:18:34.9667760Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9668292Z warnings.warn( 2023-01-11T21:18:34.9668543Z dist init r=1, world=2 2023-01-11T21:18:34.9668792Z dist init r=0, world=2 2023-01-11T21:18:34.9669011Z ok (5.112s) 2023-01-11T21:18:34.9669443Z test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_False_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9670189Z Tests :meth:`scatter_full_optim_state_dict` for a non-FSDP-root ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8692 2023-01-11T21:18:34.9670722Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8693 2023-01-11T21:18:34.9671305Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9671750Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9672326Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9672777Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9673351Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9673795Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9674369Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9674814Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9675265Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9675762Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9676464Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9677150Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9677671Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9678140Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9679018Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9679579Z warnings.warn( 2023-01-11T21:18:34.9680364Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9680922Z warnings.warn( 2023-01-11T21:18:34.9681200Z dist init r=1, world=2 2023-01-11T21:18:34.9681458Z dist init r=0, world=2 2023-01-11T21:18:34.9681696Z ok (5.113s) 2023-01-11T21:18:34.9682116Z test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_True_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9682865Z Tests :meth:`scatter_full_optim_state_dict` for a non-FSDP-root ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8775 2023-01-11T21:18:34.9683395Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8776 2023-01-11T21:18:34.9683991Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9684419Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9684995Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9685467Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9686042Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9686467Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9687036Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9687499Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9687932Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9688427Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9689084Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9689772Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9690274Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9690743Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9691642Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9692204Z warnings.warn( 2023-01-11T21:18:34.9692974Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9693573Z warnings.warn( 2023-01-11T21:18:34.9693823Z dist init r=1, world=2 2023-01-11T21:18:34.9694078Z dist init r=0, world=2 2023-01-11T21:18:34.9694300Z ok (5.012s) 2023-01-11T21:18:34.9694731Z test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_True_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9695482Z Tests :meth:`scatter_full_optim_state_dict` for a non-FSDP-root ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8858 2023-01-11T21:18:34.9695993Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8859 2023-01-11T21:18:34.9696796Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9697258Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9697840Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9698392Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9698986Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9699427Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9699975Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9700439Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9700944Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9701443Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9703147Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9703890Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9704415Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9704886Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9705771Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9706328Z warnings.warn( 2023-01-11T21:18:34.9707120Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9707680Z warnings.warn( 2023-01-11T21:18:34.9707911Z dist init r=1, world=2 2023-01-11T21:18:34.9708167Z dist init r=0, world=2 2023-01-11T21:18:34.9708409Z ok (5.112s) 2023-01-11T21:18:34.9708827Z test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_False_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9709579Z Tests :meth:`scatter_full_optim_state_dict` for a non-FSDP-root ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8941 2023-01-11T21:18:34.9710110Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8942 2023-01-11T21:18:34.9710714Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9711143Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9711838Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9712362Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9712919Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9713365Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9713935Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9714397Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9714830Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9715328Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9715986Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9716738Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9717249Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9717719Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9718624Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9719189Z warnings.warn( 2023-01-11T21:18:34.9719956Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9720515Z warnings.warn( 2023-01-11T21:18:34.9720767Z dist init r=1, world=2 2023-01-11T21:18:34.9721003Z dist init r=0, world=2 2023-01-11T21:18:34.9721246Z ok (5.112s) 2023-01-11T21:18:34.9721679Z test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_False_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9722427Z Tests :meth:`scatter_full_optim_state_dict` for a non-FSDP-root ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9024 2023-01-11T21:18:34.9722936Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9025 2023-01-11T21:18:34.9723539Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9723992Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9724550Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9725023Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9725600Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9726043Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9726595Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9727060Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9727513Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9728013Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9728656Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9729411Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9729932Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9730389Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9731290Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9731854Z warnings.warn( 2023-01-11T21:18:34.9732646Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9733200Z warnings.warn( 2023-01-11T21:18:34.9733430Z dist init r=0, world=2 2023-01-11T21:18:34.9733729Z dist init r=1, world=2 2023-01-11T21:18:34.9733978Z ok (5.113s) 2023-01-11T21:18:34.9734394Z test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_True_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9735150Z Tests :meth:`scatter_full_optim_state_dict` for a non-FSDP-root ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9107 2023-01-11T21:18:34.9735685Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9108 2023-01-11T21:18:34.9736267Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9737013Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9737605Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9738074Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9738637Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9739085Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9739713Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9740176Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9740611Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9741114Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9741774Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9742449Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9742972Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9743441Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9744341Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9744883Z warnings.warn( 2023-01-11T21:18:34.9745673Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9746318Z warnings.warn( 2023-01-11T21:18:34.9746568Z dist init r=0, world=2 2023-01-11T21:18:34.9746807Z dist init r=1, world=2 2023-01-11T21:18:34.9747047Z ok (5.013s) 2023-01-11T21:18:34.9747478Z test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_True_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9748209Z Tests :meth:`scatter_full_optim_state_dict` for a non-FSDP-root ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9190 2023-01-11T21:18:34.9748737Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9191 2023-01-11T21:18:34.9749342Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9749811Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9750410Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9750862Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9751499Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9751954Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9752507Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9752972Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9753424Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9753920Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9754557Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9755250Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9755770Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9756239Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9757125Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9757681Z warnings.warn( 2023-01-11T21:18:34.9758473Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9759024Z warnings.warn( 2023-01-11T21:18:34.9759255Z dist init r=1, world=2 2023-01-11T21:18:34.9759513Z dist init r=0, world=2 2023-01-11T21:18:34.9759754Z ok (5.013s) 2023-01-11T21:18:34.9760076Z test_scatter_full_optim_state_dict_transformer (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9760733Z Tests :meth:`scatter_full_optim_state_dict` for an FSDP-root ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9273 2023-01-11T21:18:34.9761253Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9274 2023-01-11T21:18:34.9761839Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9762288Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9762858Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9763401Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9763962Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9764406Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9764975Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9765436Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9765868Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9766363Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9767018Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9767688Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9768254Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9768731Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9769214Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:18:34.9769685Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:18:34.9770339Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:18:34.9771020Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:18:34.9771839Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T21:18:34.9772308Z warnings.warn( 2023-01-11T21:18:34.9773199Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T21:18:34.9774446Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T21:18:34.9775181Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T21:18:34.9775678Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T21:18:34.9776313Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:18:34.9777201Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:18:34.9777609Z dist init r=1, world=2 2023-01-11T21:18:34.9777843Z dist init r=0, world=2 2023-01-11T21:18:34.9778086Z ok (5.513s) 2023-01-11T21:18:34.9778438Z test_shard_full_optim_state_dict_nested_halve_world_size (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9779124Z Tests :meth:`shard_full_optim_state_dict` for a non-FSDP-root model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9366 2023-01-11T21:18:34.9779637Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9367 2023-01-11T21:18:34.9780336Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9780789Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9781346Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9781814Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9782387Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9782829Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9783376Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9783839Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9784288Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9784785Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9785474Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9786171Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9786690Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9787141Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9787623Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:18:34.9788111Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:18:34.9788757Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:18:34.9789569Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T21:18:34.9790054Z warnings.warn( 2023-01-11T21:18:34.9790588Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:18:34.9791539Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9792079Z warnings.warn( 2023-01-11T21:18:34.9792868Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9793418Z warnings.warn( 2023-01-11T21:18:34.9793798Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T21:18:34.9794273Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T21:18:34.9794928Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:18:34.9795608Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:18:34.9796005Z dist init r=0, world=2 2023-01-11T21:18:34.9796238Z dist init r=1, world=2 2023-01-11T21:18:34.9797409Z ok (5.112s) 2023-01-11T21:18:34.9797855Z test_shard_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_False_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9798690Z Tests :meth:`shard_full_optim_state_dict` for a non-FSDP-root model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9459 2023-01-11T21:18:34.9799229Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9460 2023-01-11T21:18:34.9799834Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9800285Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9800890Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9801366Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9801941Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9802364Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9802939Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9803455Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9803917Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9804397Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9805059Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9805745Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9806268Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9806762Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9807671Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9808234Z warnings.warn( 2023-01-11T21:18:34.9809023Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9809554Z warnings.warn( 2023-01-11T21:18:34.9809804Z dist init r=0, world=2 2023-01-11T21:18:34.9810056Z dist init r=1, world=2 2023-01-11T21:18:34.9810276Z ok (5.112s) 2023-01-11T21:18:34.9810705Z test_shard_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_False_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9811463Z Tests :meth:`shard_full_optim_state_dict` for a non-FSDP-root model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9542 2023-01-11T21:18:34.9812004Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9543 2023-01-11T21:18:34.9812589Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9813035Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9813606Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9814057Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9814631Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9815073Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9815704Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9816151Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9816843Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9817357Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9818000Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9818688Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9819206Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9819674Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9820648Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9821224Z warnings.warn( 2023-01-11T21:18:34.9822019Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9822571Z warnings.warn( 2023-01-11T21:18:34.9822800Z dist init r=0, world=2 2023-01-11T21:18:34.9823049Z dist init r=1, world=2 2023-01-11T21:18:34.9823288Z ok (5.212s) 2023-01-11T21:18:34.9823697Z test_shard_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_True_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9824456Z Tests :meth:`shard_full_optim_state_dict` for a non-FSDP-root model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9625 2023-01-11T21:18:34.9824991Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9626 2023-01-11T21:18:34.9825593Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9826020Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9826589Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9827062Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9827637Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9828059Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9828629Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9829096Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9829530Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9830026Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9830679Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9831361Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9831862Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9832333Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9833314Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9833876Z warnings.warn( 2023-01-11T21:18:34.9834644Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9835189Z warnings.warn( 2023-01-11T21:18:34.9835440Z dist init r=0, world=2 2023-01-11T21:18:34.9835693Z dist init r=1, world=2 2023-01-11T21:18:34.9835913Z ok (5.013s) 2023-01-11T21:18:34.9836339Z test_shard_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_True_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9837094Z Tests :meth:`shard_full_optim_state_dict` for a non-FSDP-root model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9708 2023-01-11T21:18:34.9837657Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9709 2023-01-11T21:18:34.9838270Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9838714Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9839283Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9839731Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9840304Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9840748Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9841299Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9841767Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9842222Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9842716Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9843351Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9844036Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9844552Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9845021Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9845908Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9846467Z warnings.warn( 2023-01-11T21:18:34.9847249Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9847803Z warnings.warn( 2023-01-11T21:18:34.9848032Z dist init r=0, world=2 2023-01-11T21:18:34.9848283Z dist init r=1, world=2 2023-01-11T21:18:34.9848521Z ok (5.012s) 2023-01-11T21:18:34.9848930Z test_shard_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_False_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9849759Z Tests :meth:`shard_full_optim_state_dict` for a non-FSDP-root model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9791 2023-01-11T21:18:34.9850294Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9792 2023-01-11T21:18:34.9850899Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9851328Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9851897Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9852362Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9852916Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9853357Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9853928Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9854433Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9854873Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9855369Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9856022Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9856888Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9857399Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9857866Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9858781Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9859344Z warnings.warn( 2023-01-11T21:18:34.9860113Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9860663Z warnings.warn( 2023-01-11T21:18:34.9860910Z dist init r=1, world=2 2023-01-11T21:18:34.9861146Z dist init r=0, world=2 2023-01-11T21:18:34.9861393Z ok (5.113s) 2023-01-11T21:18:34.9861818Z test_shard_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_False_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9862573Z Tests :meth:`shard_full_optim_state_dict` for a non-FSDP-root model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9874 2023-01-11T21:18:34.9863085Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9875 2023-01-11T21:18:34.9863690Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9864135Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9864683Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9865150Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9865725Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9866168Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9866812Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9867280Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9867731Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9868224Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9868857Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9869540Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9870060Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9870512Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9871472Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9872044Z warnings.warn( 2023-01-11T21:18:34.9872831Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9873377Z warnings.warn( 2023-01-11T21:18:34.9873608Z dist init r=0, world=2 2023-01-11T21:18:34.9873859Z dist init r=1, world=2 2023-01-11T21:18:34.9874099Z ok (5.112s) 2023-01-11T21:18:34.9874506Z test_shard_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_True_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9875255Z Tests :meth:`shard_full_optim_state_dict` for a non-FSDP-root model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9957 2023-01-11T21:18:34.9875792Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9958 2023-01-11T21:18:34.9876379Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9876828Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9877401Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9877868Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9878419Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9878864Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9879436Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9879903Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9880340Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9880832Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9881487Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9882150Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9882671Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9883140Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9884118Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9884660Z warnings.warn( 2023-01-11T21:18:34.9885446Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9885997Z warnings.warn( 2023-01-11T21:18:34.9886247Z dist init r=0, world=2 2023-01-11T21:18:34.9886479Z dist init r=1, world=2 2023-01-11T21:18:34.9886717Z ok (5.012s) 2023-01-11T21:18:34.9887147Z test_shard_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_True_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9887885Z Tests :meth:`shard_full_optim_state_dict` for a non-FSDP-root model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10040 2023-01-11T21:18:34.9888470Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10041 2023-01-11T21:18:34.9889084Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9889532Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9890080Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9890551Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9891126Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9891570Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9892123Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9892587Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9893040Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9893517Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9894174Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9894855Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9895374Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9895825Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9896914Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9897485Z warnings.warn( 2023-01-11T21:18:34.9898275Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9898802Z warnings.warn( 2023-01-11T21:18:34.9899051Z dist init r=0, world=2 2023-01-11T21:18:34.9899302Z dist init r=1, world=2 2023-01-11T21:18:34.9899522Z ok (5.012s) 2023-01-11T21:18:34.9899856Z test_shard_full_optim_state_dict_transformer (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9900551Z Tests :meth:`shard_full_optim_state_dict` for an FSDP-root ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10123 2023-01-11T21:18:34.9901173Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10124 2023-01-11T21:18:34.9901771Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9902221Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9902790Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9903241Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9903819Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9904261Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9904828Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9905278Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9905794Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9906301Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9906958Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9907629Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9908149Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9908617Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9909078Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:18:34.9909573Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:18:34.9910225Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:18:34.9910911Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:18:34.9911709Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T21:18:34.9912189Z warnings.warn( 2023-01-11T21:18:34.9913080Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T21:18:34.9914332Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T21:18:34.9915072Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T21:18:34.9915545Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T21:18:34.9916197Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:18:34.9916884Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:18:34.9917351Z dist init r=0, world=2 2023-01-11T21:18:34.9917585Z dist init r=1, world=2 2023-01-11T21:18:34.9917824Z ok (5.613s) 2023-01-11T21:18:34.9918276Z test_shard_full_optim_state_dict_unmanaged_params_state_dict_type_StateDictType_FULL_STATE_DICT_add_to_fsdp_module_False (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9918886Z Tests :meth:`shard_full_optim_state_dict` when there are unmanaged ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10216 2023-01-11T21:18:34.9919415Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10217 2023-01-11T21:18:34.9920024Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9920475Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9921027Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9921493Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9922069Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9922540Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9923121Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9923579Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9924029Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9924504Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9925159Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9925845Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9926370Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9987238Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:34.9988277Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9988850Z warnings.warn( 2023-01-11T21:18:34.9989627Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:34.9990176Z warnings.warn( 2023-01-11T21:18:34.9990409Z dist init r=0, world=2 2023-01-11T21:18:34.9990639Z dist init r=1, world=2 2023-01-11T21:18:34.9990849Z ok (4.812s) 2023-01-11T21:18:34.9991268Z test_shard_full_optim_state_dict_unmanaged_params_state_dict_type_StateDictType_FULL_STATE_DICT_add_to_fsdp_module_True (__main__.TestFSDPOptimState) 2023-01-11T21:18:34.9991873Z Tests :meth:`shard_full_optim_state_dict` when there are unmanaged ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10299 2023-01-11T21:18:34.9992387Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10300 2023-01-11T21:18:34.9993001Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9993445Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9994021Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9994635Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9995216Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:34.9995654Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:34.9996201Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:34.9996662Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:34.9997112Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:34.9997608Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:34.9998240Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9998907Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:34.9999432Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:34.9999970Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:35.0000913Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:35.0001471Z warnings.warn( 2023-01-11T21:18:35.0002250Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1093: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2023-01-11T21:18:35.0002791Z warnings.warn( 2023-01-11T21:18:35.0003026Z dist init r=1, world=2 2023-01-11T21:18:35.0003276Z dist init r=0, world=2 2023-01-11T21:18:35.0003515Z ok (4.812s) 2023-01-11T21:18:35.0003840Z test_shard_full_optim_state_dict_unmanaged_params_state_dict_type_StateDictType_SHARDED_STATE_DICT_add_to_fsdp_module_False (__main__.TestFSDPOptimState) 2023-01-11T21:18:35.0004133Z Tests :meth:`shard_full_optim_state_dict` when there are unmanaged ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10382 2023-01-11T21:18:35.0004355Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10383 2023-01-11T21:18:35.0004727Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:35.0004904Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:35.0005280Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:35.0005478Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:35.0005840Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:35.0006016Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:35.0006389Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:35.0006559Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:35.0006805Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:35.0007048Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:35.0007450Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:35.0007844Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:35.0008140Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:35.0008369Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:35.0008984Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2533: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2023-01-11T21:18:35.0009094Z warnings.warn( 2023-01-11T21:18:35.0009688Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2533: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2023-01-11T21:18:35.0009801Z warnings.warn( 2023-01-11T21:18:35.0009905Z dist init r=1, world=2 2023-01-11T21:18:35.0010013Z dist init r=0, world=2 2023-01-11T21:18:35.0010116Z ok (4.812s) 2023-01-11T21:18:35.0010478Z test_shard_full_optim_state_dict_unmanaged_params_state_dict_type_StateDictType_SHARDED_STATE_DICT_add_to_fsdp_module_True (__main__.TestFSDPOptimState) 2023-01-11T21:18:35.0010796Z Tests :meth:`shard_full_optim_state_dict` when there are unmanaged ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10465 2023-01-11T21:18:35.0011013Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10466 2023-01-11T21:18:35.0011366Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:35.0011541Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:35.0011918Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:35.0012109Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:35.0012477Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:35.0012652Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:35.0013028Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:35.0013216Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:35.0013457Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:35.0013681Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:35.0014076Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:35.0014460Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:35.0014678Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:35.0014899Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:35.0015514Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2533: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2023-01-11T21:18:35.0015625Z warnings.warn( 2023-01-11T21:18:35.0016237Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2533: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2023-01-11T21:18:35.0016345Z warnings.warn( 2023-01-11T21:18:35.0016439Z dist init r=0, world=2 2023-01-11T21:18:35.0016826Z dist init r=1, world=2 2023-01-11T21:18:35.0017031Z ok (4.712s) 2023-01-11T21:18:35.0017204Z test_use_orig_params (__main__.TestFSDPOptimState) 2023-01-11T21:18:35.0017571Z Tests :meth:`optim_state_dict` for an FSDP-root nested model. ... skip: The test currently fails on CI. (0.001s) 2023-01-11T21:18:35.0017746Z test_use_orig_params_error (__main__.TestFSDPOptimState) 2023-01-11T21:18:35.0018056Z Tests that the optimizer state checkpointing APIs raise an error ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10548 2023-01-11T21:18:35.0018270Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10549 2023-01-11T21:18:35.0018620Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:35.0018794Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:35.0019165Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:35.0019355Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:35.0019774Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:18:35.0019951Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:18:35.0020326Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:18:35.0020509Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:18:35.0020734Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:18:35.0020973Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:18:35.0021369Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:35.0021763Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:18:35.0021988Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:18:35.0022210Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:18:35.0022321Z dist init r=0, world=2 2023-01-11T21:18:35.0022427Z dist init r=1, world=2 2023-01-11T21:18:35.0022522Z ok (4.712s) 2023-01-11T21:18:35.0022544Z 2023-01-11T21:18:35.0022795Z ---------------------------------------------------------------------- 2023-01-11T21:18:35.0022911Z Ran 55 tests in 261.764s 2023-01-11T21:18:35.0022930Z 2023-01-11T21:18:35.0023034Z OK (skipped=2) 2023-01-11T21:18:35.0023053Z 2023-01-11T21:18:35.0023174Z Generating XML reports... 2023-01-11T21:18:35.0023623Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_optim_state/TEST-TestFSDPOptimState-20230111211412.xml 2023-01-11T21:18:35.0023646Z 2023-01-11T21:18:35.0024027Z ##[endgroup] 2023-01-11T21:18:35.0024509Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_optim_state (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_optim_state_hs0nwjoo) 2023-01-11T21:18:35.0024531Z 2023-01-11T21:18:35.0024823Z Running distributed/algorithms/quantization/test_quantization ... [2023-01-11 21:18:34.930469] 2023-01-11T21:18:35.0024927Z /usr/bin/mpiexec 2023-01-11T21:18:35.0025148Z MPI not available -- MPI backend tests will be skipped 2023-01-11T21:18:35.0025527Z Map different backends to different shards for distributed/algorithms/quantization/test_quantization: {'gloo': 1, 'nccl': 2} 2023-01-11T21:18:35.0025652Z Shard 3: test should be run in 1 2023-01-11T21:18:35.0025771Z Shard 3: nccl should be run in 2 2023-01-11T21:18:35.0025891Z Shard 3: gloo should be run in 1 2023-01-11T21:18:35.0026005Z Shard 3: ucc should be run in 1 2023-01-11T21:18:35.0026256Z Running distributed/test_distributed_spawn ... [2023-01-11 21:18:34.931966] 2023-01-11T21:18:35.0026418Z /usr/bin/mpiexec 2023-01-11T21:18:35.0026640Z MPI not available -- MPI backend tests will be skipped 2023-01-11T21:18:35.0026998Z Map different backends to different shards for distributed/test_distributed_spawn: {'gloo': 1, 'nccl': 2, 'ucc': 3} 2023-01-11T21:18:35.0027119Z Shard 3: test should be run in 1 2023-01-11T21:18:35.0027239Z Shard 3: nccl should be run in 2 2023-01-11T21:18:35.0027360Z Shard 3: gloo should be run in 1 2023-01-11T21:18:35.0027551Z Running distributed tests for the ucc backend with env init_method in shard 3 of 3 2023-01-11T21:18:35.0028057Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:18:34.934190] 2023-01-11T21:44:28.4436521Z 2023-01-11T21:44:28.4437017Z Expand the folded group to see the log file of distributed/test_distributed_spawn 2023-01-11T21:44:28.4437954Z ##[group]PRINTING LOG FILE of distributed/test_distributed_spawn (/var/lib/jenkins/workspace/test/test-reports/distributed-test_distributed_spawn_fsaba4i5) 2023-01-11T21:44:28.4446365Z 2023-01-11T21:44:28.4483554Z , <__main__.TestDistBackendWithSpawn testMethod=test_3_level_hierarchical_model_averager>, <__main__.TestDistBackendWithSpawn testMethod=test_Backend_enum_class>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallelCPU>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallelCPU_grad_is_view>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_SyncBatchNorm>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_SyncBatchNorm_2D_Input>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_SyncBatchNorm_Channels_Last>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_SyncBatchNorm_No_Affine>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_SyncBatchNorm_Single_Input_Per_Process>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_non_default_stream>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_requires_grad>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_with_amp_and_grad_is_view>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedSampler_padding>, <__main__.TestDistBackendWithSpawn testMethod=test_SyncBatchNorm_process_group>, <__main__.TestDistBackendWithSpawn testMethod=test_accumulate_gradients_no_sync>, <__main__.TestDistBackendWithSpawn testMethod=test_accumulate_gradients_no_sync_allreduce_hook>, <__main__.TestDistBackendWithSpawn testMethod=test_accumulate_gradients_no_sync_allreduce_with_then_hook>, <__main__.TestDistBackendWithSpawn testMethod=test_accumulate_gradients_no_sync_grad_is_view>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_coalesced_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_coalesced_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_coalesced_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_coalesced_simple>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_coalesced_with_empty>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_cuda_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_into_cat_tensor_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_into_stack_tensor_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_multigpu>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_multigpu_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_object_default_pg>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_object_subgroup>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_v_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_full_group_max>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_full_group_min>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_full_group_product>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_full_group_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_group_max>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_group_min>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_group_product>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_group_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_max>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_max_complex_unsupported>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_min>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_product>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_complex_unsupported_ops>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_full_group_max>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_full_group_min>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_full_group_product>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_full_group_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_group_max>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_group_min>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_group_product>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_group_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_max>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_min>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_multigpu>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_multigpu_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_product>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_result_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_sum_async>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_sum_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_sum_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_sum_cuda_async>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_sum_cuda_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_cuda_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_full_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split_cuda_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split_full_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split_cuda_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split_full_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_average_parameters>, <__main__.TestDistBackendWithSpawn testMethod=test_backend_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_backend_group>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_full_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_group>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_timeout_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_timeout_global>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_timeout_group>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_gloo>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_gloo_tags>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_mixed_backend_err>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_nccl>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_no_rank_zero_nccl>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_op_err>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_op_list_err>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_ring_exchange_nccl>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_self_nccl>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_tensor_err>, <__main__.TestDistBackendWithSpawn testMethod=test_broadcast>, <__main__.TestDistBackendWithSpawn testMethod=test_broadcast_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_broadcast_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_broadcast_group>, <__main__.TestDistBackendWithSpawn testMethod=test_broadcast_multigpu>, <__main__.TestDistBackendWithSpawn testMethod=test_broadcast_object_list>, <__main__.TestDistBackendWithSpawn testMethod=test_compute_bucket_assignment_by_size_sparse_error_with_logger>, <__main__.TestDistBackendWithSpawn testMethod=test_compute_bucket_assignment_by_size_sparse_error_without_logger>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_apply_optim_in_backward>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_apply_optim_in_backward_grad_as_bucket_view_false>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_apply_optim_in_backward_ignored_params>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_broadcast_buffer>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_broadcast_buffer_via_hook>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_buffer_hook_allreduce>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_buffer_hook_allreduce_return_future>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_build_debug_param_to_name_mapping>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_build_debug_param_to_name_mapping_requires_grad>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_comm_hook_logging>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_control_flow_different_across_ranks>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_control_flow_same_across_ranks>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_create_graph>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_device>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_forward_backward_hook>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_grad_div_uneven_inputs>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_parity_allreduce>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_parity_allreduce_process_group>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_parity_post_localSGD>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_parity_powerSGD>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_pickling_powerSGD>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adam_optimize_subset_False>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adam_optimize_subset_True>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_False_optimize_subset_False>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_False_optimize_subset_True>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_True_optimize_subset_False>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_True_optimize_subset_True>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_False_optimize_subset_False>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_False_optimize_subset_True>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_True_optimize_subset_False>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_True_optimize_subset_True>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_sgd_optimize_subset_False>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_sgd_optimize_subset_True>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_ignore_params_arg>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_inference>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_join_model_equivalence>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_logging_data_cpu>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_logging_data_gpu>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_model_diff_num_params_across_ranks>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_model_diff_shape_across_ranks>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_multiple_nested_unused_params_err_ignore_params>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_multiple_nested_unused_params_error>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_namedtuple>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_new_tensor_in_fwd>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_new_tensor_in_fwd_static_graph>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_profiling_autograd_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_profiling_torch_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_python_error_logged>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_returns_tensor_with_no_grad>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_shared_grad_acc_unused_params>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_static_graph_nested_types>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_sync_bn_training_vs_eval>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_sync_module_states>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_uneven_input_exception>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_uneven_input_join_disable>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_uneven_inputs>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_uneven_inputs_stop_iteration_sync_bn>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_unused_params_rebuild_buckets_exception>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_zero_output_features>, <__main__.TestDistBackendWithSpawn testMethod=test_destroy_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_destroy_group>, <__main__.TestDistBackendWithSpawn testMethod=test_detect_ddp_is_actually_static>, <__main__.TestDistBackendWithSpawn testMethod=test_different_graph_across_ranks>, <__main__.TestDistBackendWithSpawn testMethod=test_dump_DDP_relevant_env_vars>, <__main__.TestDistBackendWithSpawn testMethod=test_gather>, <__main__.TestDistBackendWithSpawn testMethod=test_gather_checks>, <__main__.TestDistBackendWithSpawn testMethod=test_gather_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_gather_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_gather_group>, <__main__.TestDistBackendWithSpawn testMethod=test_gather_object>, <__main__.TestDistBackendWithSpawn testMethod=test_gather_object_subgroup>, <__main__.TestDistBackendWithSpawn testMethod=test_get_backend>, <__main__.TestDistBackendWithSpawn testMethod=test_get_future>, <__main__.TestDistBackendWithSpawn testMethod=test_get_rank>, <__main__.TestDistBackendWithSpawn testMethod=test_get_rank_size_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_get_rank_size_group>, <__main__.TestDistBackendWithSpawn testMethod=test_invalid_static_graph>, <__main__.TestDistBackendWithSpawn testMethod=test_irecv>, <__main__.TestDistBackendWithSpawn testMethod=test_isend>, <__main__.TestDistBackendWithSpawn testMethod=test_isend_autograd_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_isend_torch_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_monitored_barrier_allreduce_hang>, <__main__.TestDistBackendWithSpawn testMethod=test_monitored_barrier_allreduce_hang_wait_all_ranks>, <__main__.TestDistBackendWithSpawn testMethod=test_monitored_barrier_failure_order>, <__main__.TestDistBackendWithSpawn testMethod=test_monitored_barrier_gloo>, <__main__.TestDistBackendWithSpawn testMethod=test_monitored_barrier_gloo_rank_0_timeout>, <__main__.TestDistBackendWithSpawn testMethod=test_monitored_barrier_gloo_subgroup>, <__main__.TestDistBackendWithSpawn testMethod=test_monitored_barrier_wait_all_ranks>, <__main__.TestDistBackendWithSpawn testMethod=test_nccl_backend_bool_allgather>, <__main__.TestDistBackendWithSpawn testMethod=test_nccl_backend_bool_allreduce>, <__main__.TestDistBackendWithSpawn testMethod=test_nccl_backend_bool_broadcast>, <__main__.TestDistBackendWithSpawn testMethod=test_nccl_backend_bool_reduce>, <__main__.TestDistBackendWithSpawn testMethod=test_nccl_high_priority_stream>, <__main__.TestDistBackendWithSpawn testMethod=test_new_subgroups>, <__main__.TestDistBackendWithSpawn testMethod=test_new_subgroups_by_enumeration>, <__main__.TestDistBackendWithSpawn testMethod=test_new_subgroups_by_enumeration_input_rank_exceeds_world_size>, <__main__.TestDistBackendWithSpawn testMethod=test_new_subgroups_by_enumeration_negative_input_rank>, <__main__.TestDistBackendWithSpawn testMethod=test_new_subgroups_group_size_exceeds_world_size>, <__main__.TestDistBackendWithSpawn testMethod=test_new_subgroups_overlap_not_allowed>, <__main__.TestDistBackendWithSpawn testMethod=test_new_subgroups_world_size_not_divisible_by_group_size>, <__main__.TestDistBackendWithSpawn testMethod=test_output_unused_in_loss_dict_module>, <__main__.TestDistBackendWithSpawn testMethod=test_output_unused_in_loss_tuple_module>, <__main__.TestDistBackendWithSpawn testMethod=test_periodic_model_averager>, <__main__.TestDistBackendWithSpawn testMethod=test_periodic_model_averager_param_group>, <__main__.TestDistBackendWithSpawn testMethod=test_post_localSGD_optimizer_parity>, <__main__.TestDistBackendWithSpawn testMethod=test_post_localSGD_optimizer_parity_grad_is_view>, <__main__.TestDistBackendWithSpawn testMethod=test_post_localSGD_optimizer_parity_with_hierarchical_sgd>, <__main__.TestDistBackendWithSpawn testMethod=test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view>, <__main__.TestDistBackendWithSpawn testMethod=test_post_localSGD_optimizer_step_reload>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_full_group_max>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_full_group_min>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_full_group_product>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_full_group_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_group_max>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_group_min>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_group_product>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_group_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_max>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_min>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_multigpu>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_product>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_scatter_tensor_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_scatter_v_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_sum_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_sum_cuda_twice>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_sum_twice>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter_checks>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter_cuda_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter_group>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter_object_list>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_any_source>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_any_source_autograd_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_any_source_torch_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_autograd_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_nccl>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_nccl_autograd_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_nccl_torch_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_torch_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_with_tag>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_with_tag_autograd_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_with_tag_torch_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_sparse_all_reduce_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_sparse_all_reduce_sum_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_stateless_api_with_ddp>, <__main__.TestDistBackendWithSpawn testMethod=test_static_graph_api_cpu>, <__main__.TestDistBackendWithSpawn testMethod=test_sync_bn_logged>, <__main__.TestDistBackendWithSpawn testMethod=test_undefined_grad_parity_unused_parameters>, <__main__.TestDistBackendWithSpawn testMethod=test_verify_model_across_rank_with_logger>, <__main__.TestDistBackendWithSpawn testMethod=test_verify_model_across_rank_without_logger>]> 2023-01-11T21:44:28.4519133Z test_1_level_hierarchical_model_averager_equivalent_to_periodic_model_averager (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4520892Z test_3_level_hierarchical_model_averager (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4521331Z test_Backend_enum_class (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4521758Z test_DistributedDataParallel (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4522193Z test_DistributedDataParallelCPU (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4522745Z test_DistributedDataParallelCPU_grad_is_view (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4523668Z test_DistributedDataParallel_SyncBatchNorm (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4524700Z test_DistributedDataParallel_SyncBatchNorm_2D_Input (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4525226Z test_DistributedDataParallel_SyncBatchNorm_Channels_Last (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4525779Z test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4526345Z test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4526879Z test_DistributedDataParallel_SyncBatchNorm_No_Affine (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4527389Z test_DistributedDataParallel_SyncBatchNorm_Single_Input_Per_Process (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4527915Z test_DistributedDataParallel_non_default_stream (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4528410Z test_DistributedDataParallel_requires_grad (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4528885Z test_DistributedDataParallel_with_amp_and_grad_is_view (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4529360Z test_DistributedSampler_padding (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4529798Z test_SyncBatchNorm_process_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4530232Z test_accumulate_gradients_no_sync (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4530669Z test_accumulate_gradients_no_sync_allreduce_hook (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4531156Z test_accumulate_gradients_no_sync_allreduce_with_then_hook (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4531633Z test_accumulate_gradients_no_sync_grad_is_view (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4532026Z test_all_gather (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4532433Z test_all_gather_coalesced_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4532875Z test_all_gather_coalesced_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4533306Z test_all_gather_coalesced_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4533711Z test_all_gather_coalesced_simple (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4534141Z test_all_gather_coalesced_with_empty (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4534552Z test_all_gather_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4534920Z test_all_gather_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4535322Z test_all_gather_cuda_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4535729Z test_all_gather_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4536127Z test_all_gather_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4536517Z test_all_gather_into_cat_tensor_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4537724Z test_all_gather_into_stack_tensor_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4538278Z test_all_gather_multigpu (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4538674Z test_all_gather_multigpu_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4539106Z test_all_gather_object_default_pg (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4539532Z test_all_gather_object_subgroup (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4539915Z test_all_gather_v_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4540340Z test_all_reduce_coalesced_full_group_max (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4540781Z test_all_reduce_coalesced_full_group_min (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4541227Z test_all_reduce_coalesced_full_group_product (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4541662Z test_all_reduce_coalesced_full_group_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4542091Z test_all_reduce_coalesced_group_max (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4542522Z test_all_reduce_coalesced_group_min (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4542945Z test_all_reduce_coalesced_group_product (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4543453Z test_all_reduce_coalesced_group_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4543902Z test_all_reduce_coalesced_max (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4544702Z test_all_reduce_coalesced_max_complex_unsupported (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4545134Z test_all_reduce_coalesced_min (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4545556Z test_all_reduce_coalesced_product (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4545986Z test_all_reduce_coalesced_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4546396Z test_all_reduce_complex_unsupported_ops (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4546829Z test_all_reduce_full_group_max (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4547249Z test_all_reduce_full_group_min (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4547666Z test_all_reduce_full_group_product (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4548071Z test_all_reduce_full_group_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4548475Z test_all_reduce_group_max (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4548875Z test_all_reduce_group_min (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4549261Z test_all_reduce_group_product (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4549666Z test_all_reduce_group_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4550058Z test_all_reduce_max (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4550417Z test_all_reduce_min (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4550804Z test_all_reduce_multigpu (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4551215Z test_all_reduce_multigpu_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4551635Z test_all_reduce_product (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4552009Z test_all_reduce_result_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4552403Z test_all_reduce_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4552792Z test_all_reduce_sum_async (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4553173Z test_all_reduce_sum_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4553645Z test_all_reduce_sum_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4554054Z test_all_reduce_sum_cuda_async (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4554458Z test_all_reduce_sum_cuda_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4554855Z test_all_to_all (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4555525Z test_all_to_all_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4556140Z test_all_to_all_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4556515Z test_all_to_all_cuda_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4557021Z test_all_to_all_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4557439Z test_all_to_all_full_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4557825Z test_all_to_all_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4558220Z test_all_to_all_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4558631Z test_all_to_all_single_equal_split (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4559074Z test_all_to_all_single_equal_split_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4559497Z test_all_to_all_single_equal_split_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4559948Z test_all_to_all_single_equal_split_cuda_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4560412Z test_all_to_all_single_equal_split_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4560853Z test_all_to_all_single_equal_split_full_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4561309Z test_all_to_all_single_equal_split_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4561813Z test_all_to_all_single_equal_split_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4562270Z test_all_to_all_single_unequal_split (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4562693Z test_all_to_all_single_unequal_split_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4563142Z test_all_to_all_single_unequal_split_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4563631Z test_all_to_all_single_unequal_split_cuda_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4564076Z test_all_to_all_single_unequal_split_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4564546Z test_all_to_all_single_unequal_split_full_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4565026Z test_all_to_all_single_unequal_split_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4565490Z test_all_to_all_single_unequal_split_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4565908Z test_average_parameters (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4566310Z test_backend_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4566692Z test_backend_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4567045Z test_barrier (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4567416Z test_barrier_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4567806Z test_barrier_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4568179Z test_barrier_full_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4568573Z test_barrier_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4568962Z test_barrier_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4569371Z test_barrier_timeout_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4569767Z test_barrier_timeout_global (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4570169Z test_barrier_timeout_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4570575Z test_batch_isend_irecv_gloo (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4570968Z test_batch_isend_irecv_gloo_tags (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4571396Z test_batch_isend_irecv_mixed_backend_err (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4571816Z test_batch_isend_irecv_nccl (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4572237Z test_batch_isend_irecv_no_rank_zero_nccl (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4572644Z test_batch_isend_irecv_op_err (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4573058Z test_batch_isend_irecv_op_list_err (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4573495Z test_batch_isend_irecv_ring_exchange_nccl (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4573906Z test_batch_isend_irecv_self_nccl (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4574393Z test_batch_isend_irecv_tensor_err (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4574785Z test_broadcast (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4575147Z test_broadcast_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4575542Z test_broadcast_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4575942Z test_broadcast_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4576336Z test_broadcast_multigpu (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4577147Z test_broadcast_object_list (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4577620Z test_compute_bucket_assignment_by_size_sparse_error_with_logger (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4578139Z test_compute_bucket_assignment_by_size_sparse_error_without_logger (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4578585Z test_ddp_apply_optim_in_backward (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4579056Z test_ddp_apply_optim_in_backward_grad_as_bucket_view_false (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4579541Z test_ddp_apply_optim_in_backward_ignored_params (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4580075Z test_ddp_broadcast_buffer (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4580488Z test_ddp_broadcast_buffer_via_hook (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4580914Z test_ddp_buffer_hook_allreduce (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4581382Z test_ddp_buffer_hook_allreduce_return_future (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4581814Z test_ddp_build_debug_param_to_name_mapping (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4582287Z test_ddp_build_debug_param_to_name_mapping_requires_grad (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4582739Z test_ddp_comm_hook_logging (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4583169Z test_ddp_control_flow_different_across_ranks (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4583592Z test_ddp_control_flow_same_across_ranks (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4584012Z test_ddp_create_graph (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4584395Z test_ddp_device (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4584773Z test_ddp_forward_backward_hook (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4585188Z test_ddp_grad_div_uneven_inputs (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4585611Z test_ddp_hook_parity_allreduce (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4586055Z test_ddp_hook_parity_allreduce_process_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4586481Z test_ddp_hook_parity_post_localSGD (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4586907Z test_ddp_hook_parity_powerSGD (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4587332Z test_ddp_hook_pickling_powerSGD (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4587779Z test_ddp_hook_with_optimizer_parity_adam_optimize_subset_False (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4588290Z test_ddp_hook_with_optimizer_parity_adam_optimize_subset_True (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4588859Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_False_optimize_subset_False (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4589477Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_False_optimize_subset_True (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4590059Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_True_optimize_subset_False (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4590673Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_True_optimize_subset_True (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4591270Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_False_optimize_subset_False (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4591963Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_False_optimize_subset_True (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4592568Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_True_optimize_subset_False (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4593154Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_True_optimize_subset_True (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4593707Z test_ddp_hook_with_optimizer_parity_sgd_optimize_subset_False (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4594212Z test_ddp_hook_with_optimizer_parity_sgd_optimize_subset_True (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4594663Z test_ddp_ignore_params_arg (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4595036Z test_ddp_inference (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4595463Z test_ddp_join_model_equivalence (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4595879Z test_ddp_logging_data_cpu (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4596314Z test_ddp_logging_data_gpu (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4596756Z test_ddp_model_diff_num_params_across_ranks (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4597201Z test_ddp_model_diff_shape_across_ranks (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4597650Z test_ddp_multiple_nested_unused_params_err_ignore_params (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4598129Z test_ddp_multiple_nested_unused_params_error (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4598551Z test_ddp_namedtuple (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4598955Z test_ddp_new_tensor_in_fwd (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4599350Z test_ddp_new_tensor_in_fwd_static_graph (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4599801Z test_ddp_profiling_autograd_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4600241Z test_ddp_profiling_torch_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4600643Z test_ddp_python_error_logged (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4601063Z test_ddp_returns_tensor_with_no_grad (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4601496Z test_ddp_shared_grad_acc_unused_params (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4601932Z test_ddp_static_graph_nested_types (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4602338Z test_ddp_sync_bn_training_vs_eval (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4602749Z test_ddp_sync_module_states (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4603162Z test_ddp_uneven_input_exception (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4603565Z test_ddp_uneven_input_join_disable (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4603996Z test_ddp_uneven_inputs (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4604424Z test_ddp_uneven_inputs_stop_iteration_sync_bn (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4604894Z test_ddp_unused_params_rebuild_buckets_exception (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4605314Z test_ddp_zero_output_features (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4605719Z test_destroy_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4606108Z test_destroy_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4606498Z test_detect_ddp_is_actually_static (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4606928Z test_different_graph_across_ranks (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4607349Z test_dump_DDP_relevant_env_vars (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4607712Z test_gather (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4608104Z test_gather_checks (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4608478Z test_gather_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4608938Z test_gather_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4609299Z test_gather_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4609687Z test_gather_object (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4610083Z test_gather_object_subgroup (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4610452Z test_get_backend (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4610822Z test_get_future (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4611190Z test_get_rank (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4611553Z test_get_rank_size_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4611951Z test_get_rank_size_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4612343Z test_invalid_static_graph (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4612712Z test_irecv (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4613039Z test_isend (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4613419Z test_isend_autograd_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4613824Z test_isend_torch_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4614272Z test_monitored_barrier_allreduce_hang (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4614742Z test_monitored_barrier_allreduce_hang_wait_all_ranks (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4615195Z test_monitored_barrier_failure_order (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4615589Z test_monitored_barrier_gloo (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4616010Z test_monitored_barrier_gloo_rank_0_timeout (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4616964Z test_monitored_barrier_gloo_subgroup (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4617450Z test_monitored_barrier_wait_all_ranks (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4617859Z test_nccl_backend_bool_allgather (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4618287Z test_nccl_backend_bool_allreduce (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4618706Z test_nccl_backend_bool_broadcast (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4619099Z test_nccl_backend_bool_reduce (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4619513Z test_nccl_high_priority_stream (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4619910Z test_new_subgroups (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4620310Z test_new_subgroups_by_enumeration (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4620754Z test_new_subgroups_by_enumeration_input_rank_exceeds_world_size (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4621242Z test_new_subgroups_by_enumeration_negative_input_rank (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4621713Z test_new_subgroups_group_size_exceeds_world_size (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4622142Z test_new_subgroups_overlap_not_allowed (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4622601Z test_new_subgroups_world_size_not_divisible_by_group_size (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4623057Z test_output_unused_in_loss_dict_module (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4623490Z test_output_unused_in_loss_tuple_module (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4623895Z test_periodic_model_averager (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4624323Z test_periodic_model_averager_param_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4624760Z test_post_localSGD_optimizer_parity (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4625185Z test_post_localSGD_optimizer_parity_grad_is_view (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4625661Z test_post_localSGD_optimizer_parity_with_hierarchical_sgd (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4626170Z test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4626769Z test_post_localSGD_optimizer_step_reload (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4627171Z test_reduce_full_group_max (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4627572Z test_reduce_full_group_min (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4627981Z test_reduce_full_group_product (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4628369Z test_reduce_full_group_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4629072Z test_reduce_group_max (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4629776Z test_reduce_group_min (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4630502Z test_reduce_group_product (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4631265Z test_reduce_group_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4632009Z test_reduce_max (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4632768Z test_reduce_min (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4633496Z test_reduce_multigpu (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4634271Z test_reduce_product (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4635235Z test_reduce_scatter_tensor_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4636043Z test_reduce_scatter_v_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4636744Z test_reduce_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4637458Z test_reduce_sum_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4638167Z test_reduce_sum_cuda_twice (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4638912Z test_reduce_sum_twice (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4639622Z test_scatter (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4640386Z test_scatter_checks (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4641123Z test_scatter_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4641859Z test_scatter_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4642620Z test_scatter_cuda_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4643354Z test_scatter_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4644126Z test_scatter_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4644877Z test_scatter_object_list (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4645600Z test_send_recv (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4646362Z test_send_recv_any_source (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4647208Z test_send_recv_any_source_autograd_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4648107Z test_send_recv_any_source_torch_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4648961Z test_send_recv_autograd_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4649764Z test_send_recv_nccl (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4650593Z test_send_recv_nccl_autograd_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4651432Z test_send_recv_nccl_torch_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4652251Z test_send_recv_torch_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4653067Z test_send_recv_with_tag (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4653957Z test_send_recv_with_tag_autograd_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4654831Z test_send_recv_with_tag_torch_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4655657Z test_sparse_all_reduce_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4656451Z test_sparse_all_reduce_sum_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4657659Z test_stateless_api_with_ddp (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4658483Z test_static_graph_api_cpu (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4659253Z test_sync_bn_logged (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4660074Z test_undefined_grad_parity_unused_parameters (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4661147Z test_verify_model_across_rank_with_logger (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4662060Z test_verify_model_across_rank_without_logger (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.4663461Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.4664328Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.4665497Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.4666449Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.4666933Z 2023-01-11T21:44:28.4667132Z Running tests... 2023-01-11T21:44:28.4667945Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.4669022Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.4670218Z test_1_level_hierarchical_model_averager_equivalent_to_periodic_model_averager (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.4671522Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10701 2023-01-11T21:44:28.4672469Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10702 2023-01-11T21:44:28.4673733Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.4674630Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.4675802Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.4676769Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.4677956Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.4678837Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.4680042Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.4681000Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.4681922Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.4682914Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.4684256Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.4685670Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.4686691Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.4687661Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.4688688Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T21:44:28.4690416Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T21:44:28.4691750Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T21:44:28.4693496Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T21:44:28.4694832Z [1673471925.389724] [7c5487d9c02b:10701:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.4696091Z [1673471925.391958] [7c5487d9c02b:10702:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.4697585Z [1673471925.403518] [7c5487d9c02b:10701:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.4698504Z [1673471925.403518] [7c5487d9c02b:10701:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.4699427Z [1673471925.405081] [7c5487d9c02b:10702:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.4700364Z [1673471925.405081] [7c5487d9c02b:10702:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.4701379Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T21:44:28.4703100Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T21:44:28.4704614Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T21:44:28.4706323Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T21:44:28.4707680Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T21:44:28.4709355Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T21:44:28.4710697Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T21:44:28.4712402Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T21:44:28.4713733Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T21:44:28.4715437Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T21:44:28.4716815Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T21:44:28.4718530Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T21:44:28.4719533Z ok (7.231s) 2023-01-11T21:44:28.4719791Z 2023-01-11T21:44:28.4720336Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.4720983Z Ran 1 test in 7.232s 2023-01-11T21:44:28.4721290Z 2023-01-11T21:44:28.4721464Z OK 2023-01-11T21:44:28.4721712Z 2023-01-11T21:44:28.4721925Z Generating XML reports... 2023-01-11T21:44:28.4723145Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111211839.xml 2023-01-11T21:44:28.4724627Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.4725548Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.4726685Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.4727638Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.4728102Z 2023-01-11T21:44:28.4728319Z Running tests... 2023-01-11T21:44:28.4729113Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.4730402Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.4731484Z test_3_level_hierarchical_model_averager (__main__.TestDistBackendWithSpawn) ... skip: Test requires world size of 4 (0.003s) 2023-01-11T21:44:28.4732190Z 2023-01-11T21:44:28.4732767Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.4733376Z Ran 1 test in 0.003s 2023-01-11T21:44:28.4733686Z 2023-01-11T21:44:28.4733896Z OK (skipped=1) 2023-01-11T21:44:28.4734190Z 2023-01-11T21:44:28.4734441Z Generating XML reports... 2023-01-11T21:44:28.4735639Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111211848.xml 2023-01-11T21:44:28.4737545Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.4738479Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.4739679Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.4740622Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.4741318Z 2023-01-11T21:44:28.4741564Z Running tests... 2023-01-11T21:44:28.4742380Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.4743446Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.4744506Z test_Backend_enum_class (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.4745506Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10849 2023-01-11T21:44:28.4746537Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10850 2023-01-11T21:44:28.4747768Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.4748715Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.4749907Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.4750882Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.4752055Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.4752968Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.4754232Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.4755157Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.4756071Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.4757087Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.4758468Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.4759844Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.4760887Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.4761843Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.4762548Z ok (4.235s) 2023-01-11T21:44:28.4762817Z 2023-01-11T21:44:28.4763362Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.4764013Z Ran 1 test in 4.235s 2023-01-11T21:44:28.4764335Z 2023-01-11T21:44:28.4764509Z OK 2023-01-11T21:44:28.4764766Z 2023-01-11T21:44:28.4764989Z Generating XML reports... 2023-01-11T21:44:28.4766221Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111211851.xml 2023-01-11T21:44:28.4767911Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.4768823Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.4769971Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.4770902Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.4771358Z 2023-01-11T21:44:28.4771571Z Running tests... 2023-01-11T21:44:28.4772378Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.4773477Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.4774555Z test_DistributedDataParallel (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.4777264Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77317 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.613s) 2023-01-11T21:44:28.4778376Z 2023-01-11T21:44:28.4778955Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.4779607Z Ran 1 test in 1.613s 2023-01-11T21:44:28.4779920Z 2023-01-11T21:44:28.4780137Z OK (skipped=1) 2023-01-11T21:44:28.4780436Z 2023-01-11T21:44:28.4780680Z Generating XML reports... 2023-01-11T21:44:28.4781899Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111211858.xml 2023-01-11T21:44:28.4783375Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.4784315Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.4785511Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.4786457Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.4786922Z 2023-01-11T21:44:28.4787126Z Running tests... 2023-01-11T21:44:28.4787942Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.4788984Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.4790061Z test_DistributedDataParallelCPU (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.4791093Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10986 2023-01-11T21:44:28.4792039Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10987 2023-01-11T21:44:28.4793256Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.4794174Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.4795370Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.4796328Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.4797497Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.4798418Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.4799642Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.4800575Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.4801461Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.4802637Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.4804001Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.4805338Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.4806346Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.4807253Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.4808164Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.4809150Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.4810174Z [1673471946.093629] [7c5487d9c02b:10986:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.4811198Z [1673471947.546481] [7c5487d9c02b:10986:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.4812207Z [1673471947.546481] [7c5487d9c02b:10986:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.4813246Z [1673471946.100161] [7c5487d9c02b:10987:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.4814233Z [1673471947.523489] [7c5487d9c02b:10987:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.4815160Z [1673471947.523489] [7c5487d9c02b:10987:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.4815831Z ok (6.207s) 2023-01-11T21:44:28.4816127Z 2023-01-11T21:44:28.4816969Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.4817650Z Ran 1 test in 6.207s 2023-01-11T21:44:28.4817961Z 2023-01-11T21:44:28.4818133Z OK 2023-01-11T21:44:28.4818365Z 2023-01-11T21:44:28.4818612Z Generating XML reports... 2023-01-11T21:44:28.4819884Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111211902.xml 2023-01-11T21:44:28.4821375Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.4822252Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.4823441Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.4824391Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.4824847Z 2023-01-11T21:44:28.4825052Z Running tests... 2023-01-11T21:44:28.4825836Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.4826916Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.4828043Z test_DistributedDataParallelCPU_grad_is_view (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.4829095Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 11100 2023-01-11T21:44:28.4830010Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 11101 2023-01-11T21:44:28.4831273Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.4832189Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.4833370Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.4834299Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.4835699Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.4836624Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.4837803Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.4838747Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.4839657Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.4840636Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.4841967Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.4843397Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.4844483Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.4845600Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.4846596Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.4847597Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.4848664Z [1673471954.748851] [7c5487d9c02b:11101:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.4849622Z [1673471956.164524] [7c5487d9c02b:11101:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.4850581Z [1673471956.164524] [7c5487d9c02b:11101:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.4851619Z [1673471954.729037] [7c5487d9c02b:11100:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.4852635Z [1673471956.128615] [7c5487d9c02b:11100:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.4853530Z [1673471956.128615] [7c5487d9c02b:11100:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.4854262Z ok (6.091s) 2023-01-11T21:44:28.4854565Z 2023-01-11T21:44:28.4855127Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.4855751Z Ran 1 test in 6.091s 2023-01-11T21:44:28.4856066Z 2023-01-11T21:44:28.4856253Z OK 2023-01-11T21:44:28.4856517Z 2023-01-11T21:44:28.4857069Z Generating XML reports... 2023-01-11T21:44:28.4858312Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111211910.xml 2023-01-11T21:44:28.4859756Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.4860747Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.4861978Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.4862980Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.4863433Z 2023-01-11T21:44:28.4863659Z Running tests... 2023-01-11T21:44:28.4864490Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.4865574Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.4866815Z test_DistributedDataParallel_SyncBatchNorm (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.4867879Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 11214 2023-01-11T21:44:28.4869005Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 11215 2023-01-11T21:44:28.4870288Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.4871182Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.4872373Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.4873333Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.4874529Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.4875459Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.4876668Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.4877631Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.4878517Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.4879680Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.4881061Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.4882517Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.4883546Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.4884507Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.4885578Z [1673471964.805060] [7c5487d9c02b:11215:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.4886572Z [1673471964.818333] [7c5487d9c02b:11215:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.4887522Z [1673471964.818333] [7c5487d9c02b:11215:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.4888548Z [1673471964.801308] [7c5487d9c02b:11214:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.4889557Z [1673471964.815041] [7c5487d9c02b:11214:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.4890466Z [1673471964.815041] [7c5487d9c02b:11214:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.4891439Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.4892454Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.4893464Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.4894425Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.4895418Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.4896401Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.4897753Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.4898710Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.4899704Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.4900685Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.4901812Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.4902818Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.4903554Z ok (7.254s) 2023-01-11T21:44:28.4903857Z 2023-01-11T21:44:28.4904422Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.4905034Z Ran 1 test in 7.254s 2023-01-11T21:44:28.4905342Z 2023-01-11T21:44:28.4905512Z OK 2023-01-11T21:44:28.4905790Z 2023-01-11T21:44:28.4906036Z Generating XML reports... 2023-01-11T21:44:28.4907216Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111211919.xml 2023-01-11T21:44:28.4908737Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.4909646Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.4910836Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.4911774Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.4912448Z 2023-01-11T21:44:28.4912694Z Running tests... 2023-01-11T21:44:28.4913537Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.4914582Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.4915719Z test_DistributedDataParallel_SyncBatchNorm_2D_Input (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.4916826Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 11332 2023-01-11T21:44:28.4917743Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 11333 2023-01-11T21:44:28.4919000Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.4919930Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.4921127Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.4922056Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.4923248Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.4924179Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.4925379Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.4926293Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.4927208Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.4928206Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.4929564Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.4930939Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.4931995Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.4932921Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.4933928Z [1673471974.569240] [7c5487d9c02b:11333:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.4934940Z [1673471974.582563] [7c5487d9c02b:11333:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.4935882Z [1673471974.582563] [7c5487d9c02b:11333:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.4937559Z [1673471974.568731] [7c5487d9c02b:11332:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.4938583Z [1673471974.582418] [7c5487d9c02b:11332:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.4939507Z [1673471974.582418] [7c5487d9c02b:11332:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.4940480Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.4941461Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.4942418Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.4943411Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.4944130Z ok (6.127s) 2023-01-11T21:44:28.4944421Z 2023-01-11T21:44:28.4944988Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.4945781Z Ran 1 test in 6.128s 2023-01-11T21:44:28.4946130Z 2023-01-11T21:44:28.4946327Z OK 2023-01-11T21:44:28.4946588Z 2023-01-11T21:44:28.4946837Z Generating XML reports... 2023-01-11T21:44:28.4948043Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111211929.xml 2023-01-11T21:44:28.4949489Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.4950398Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.4951591Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.4952532Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.4953003Z 2023-01-11T21:44:28.4953224Z Running tests... 2023-01-11T21:44:28.4954124Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.4955193Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.4956357Z test_DistributedDataParallel_SyncBatchNorm_Channels_Last (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.4957482Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 11450 2023-01-11T21:44:28.4958411Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 11451 2023-01-11T21:44:28.4959646Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.4960564Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.4961723Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.4962689Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.4963878Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.4964794Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.4966010Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.4966964Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.4967874Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.4968830Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.4970179Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.4971786Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.4972846Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.4973808Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.4974876Z [1673471983.244711] [7c5487d9c02b:11451:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.4975875Z [1673471983.258160] [7c5487d9c02b:11451:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.4977121Z [1673471983.258160] [7c5487d9c02b:11451:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.4978156Z [1673471983.237357] [7c5487d9c02b:11450:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.4979346Z [1673471983.250998] [7c5487d9c02b:11450:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.4980304Z [1673471983.250998] [7c5487d9c02b:11450:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.4981258Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.4982270Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.4983281Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.4984208Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.4985190Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.4986181Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.4987139Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.4988111Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.4988820Z ok (6.226s) 2023-01-11T21:44:28.4989107Z 2023-01-11T21:44:28.4989676Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.4990293Z Ran 1 test in 6.226s 2023-01-11T21:44:28.4990620Z 2023-01-11T21:44:28.4990798Z OK 2023-01-11T21:44:28.4991044Z 2023-01-11T21:44:28.4991279Z Generating XML reports... 2023-01-11T21:44:28.4992475Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111211938.xml 2023-01-11T21:44:28.4993935Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.4994863Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.4996049Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.4996978Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.4997439Z 2023-01-11T21:44:28.4997649Z Running tests... 2023-01-11T21:44:28.4998471Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.4999506Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5000693Z test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5001844Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 11568 2023-01-11T21:44:28.5002777Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 11569 2023-01-11T21:44:28.5004231Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5005163Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5006374Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5007331Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5008539Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5009435Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5010624Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5011567Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5012499Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5013638Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5015013Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5016388Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5017755Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5018711Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5019707Z [1673471992.060714] [7c5487d9c02b:11568:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5020694Z [1673471992.074545] [7c5487d9c02b:11568:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5021649Z [1673471992.074545] [7c5487d9c02b:11568:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5022669Z [1673471992.069431] [7c5487d9c02b:11569:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5023684Z [1673471992.082709] [7c5487d9c02b:11569:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5024603Z [1673471992.082709] [7c5487d9c02b:11569:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5025596Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.5026599Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.5027305Z ok (6.347s) 2023-01-11T21:44:28.5027594Z 2023-01-11T21:44:28.5028168Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5028814Z Ran 1 test in 6.347s 2023-01-11T21:44:28.5029137Z 2023-01-11T21:44:28.5029314Z OK 2023-01-11T21:44:28.5029547Z 2023-01-11T21:44:28.5029792Z Generating XML reports... 2023-01-11T21:44:28.5031011Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111211946.xml 2023-01-11T21:44:28.5032475Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5033365Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5034566Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5035528Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5036193Z 2023-01-11T21:44:28.5036397Z Running tests... 2023-01-11T21:44:28.5037190Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5038270Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5039451Z test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5040554Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 11686 2023-01-11T21:44:28.5041475Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 11687 2023-01-11T21:44:28.5042751Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5043664Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5044799Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5045862Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5047245Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5048185Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5049367Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5050343Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5051279Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5052267Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5053641Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5055157Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5056244Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5057506Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5058544Z [1673472000.958026] [7c5487d9c02b:11687:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5059575Z [1673472000.971335] [7c5487d9c02b:11687:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5060520Z [1673472000.971335] [7c5487d9c02b:11687:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5148795Z [1673472000.952825] [7c5487d9c02b:11686:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5149953Z [1673472000.966686] [7c5487d9c02b:11686:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5150922Z [1673472000.966686] [7c5487d9c02b:11686:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5151900Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.5152861Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.5153935Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.5154917Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.5155593Z ok (6.956s) 2023-01-11T21:44:28.5155888Z 2023-01-11T21:44:28.5156463Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5157333Z Ran 1 test in 6.956s 2023-01-11T21:44:28.5157648Z 2023-01-11T21:44:28.5157821Z OK 2023-01-11T21:44:28.5158061Z 2023-01-11T21:44:28.5158303Z Generating XML reports... 2023-01-11T21:44:28.5159549Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111211955.xml 2023-01-11T21:44:28.5161012Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5161903Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5163072Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5164018Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5164467Z 2023-01-11T21:44:28.5164676Z Running tests... 2023-01-11T21:44:28.5165440Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5166494Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5167758Z test_DistributedDataParallel_SyncBatchNorm_No_Affine (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5168820Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 11804 2023-01-11T21:44:28.5169726Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 11805 2023-01-11T21:44:28.5170973Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5171874Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5173021Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5173975Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5175167Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5176048Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5177630Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5178566Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5179448Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5180428Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5181810Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5183246Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5184321Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5185283Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5186309Z [1673472010.450108] [7c5487d9c02b:11805:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5187306Z [1673472010.463399] [7c5487d9c02b:11805:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5188232Z [1673472010.463399] [7c5487d9c02b:11805:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5189244Z [1673472010.447175] [7c5487d9c02b:11804:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5190265Z [1673472010.460801] [7c5487d9c02b:11804:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5191394Z [1673472010.460801] [7c5487d9c02b:11804:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5192323Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.5193290Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.5194270Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.5195242Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.5195918Z ok (6.629s) 2023-01-11T21:44:28.5196193Z 2023-01-11T21:44:28.5196738Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5197371Z Ran 1 test in 6.629s 2023-01-11T21:44:28.5197666Z 2023-01-11T21:44:28.5197820Z OK 2023-01-11T21:44:28.5198054Z 2023-01-11T21:44:28.5198292Z Generating XML reports... 2023-01-11T21:44:28.5199513Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212005.xml 2023-01-11T21:44:28.5201154Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5202058Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5203271Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5204199Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5204646Z 2023-01-11T21:44:28.5204848Z Running tests... 2023-01-11T21:44:28.5205655Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5206721Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5207899Z test_DistributedDataParallel_SyncBatchNorm_Single_Input_Per_Process (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5209026Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 11922 2023-01-11T21:44:28.5209924Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 11923 2023-01-11T21:44:28.5211163Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5212036Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5213182Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5214117Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5215307Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5216244Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5217744Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5218690Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5219587Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5220564Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5221903Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5223303Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5224349Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5225481Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5226522Z [1673472019.589131] [7c5487d9c02b:11923:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5227537Z [1673472019.602645] [7c5487d9c02b:11923:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5228465Z [1673472019.602645] [7c5487d9c02b:11923:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5229452Z [1673472019.588705] [7c5487d9c02b:11922:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5230450Z [1673472019.602687] [7c5487d9c02b:11922:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5231378Z [1673472019.602687] [7c5487d9c02b:11922:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5232311Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.5233423Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.5234408Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.5235379Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.5236053Z ok (6.147s) 2023-01-11T21:44:28.5236341Z 2023-01-11T21:44:28.5236895Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5237531Z Ran 1 test in 6.147s 2023-01-11T21:44:28.5237830Z 2023-01-11T21:44:28.5237991Z OK 2023-01-11T21:44:28.5238227Z 2023-01-11T21:44:28.5238471Z Generating XML reports... 2023-01-11T21:44:28.5239705Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212014.xml 2023-01-11T21:44:28.5241180Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5242066Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5243233Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5244186Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5244638Z 2023-01-11T21:44:28.5244850Z Running tests... 2023-01-11T21:44:28.5245633Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5246689Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5247764Z test_DistributedDataParallel_non_default_stream (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5249920Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/76428 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.618s) 2023-01-11T21:44:28.5251005Z 2023-01-11T21:44:28.5251533Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5252179Z Ran 1 test in 1.618s 2023-01-11T21:44:28.5252485Z 2023-01-11T21:44:28.5252687Z OK (skipped=1) 2023-01-11T21:44:28.5252975Z 2023-01-11T21:44:28.5253203Z Generating XML reports... 2023-01-11T21:44:28.5254484Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212023.xml 2023-01-11T21:44:28.5255949Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5257250Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5258650Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5259630Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5260081Z 2023-01-11T21:44:28.5260277Z Running tests... 2023-01-11T21:44:28.5261072Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5262108Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5263206Z test_DistributedDataParallel_requires_grad (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5264260Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 12074 2023-01-11T21:44:28.5265148Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 12075 2023-01-11T21:44:28.5266433Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5267334Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5268686Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5269624Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5270812Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5271702Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5272868Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5273792Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5274664Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5275659Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5276972Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5278406Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5279461Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5280419Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5281078Z ok (4.303s) 2023-01-11T21:44:28.5281360Z 2023-01-11T21:44:28.5281901Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5282538Z Ran 1 test in 4.303s 2023-01-11T21:44:28.5282842Z 2023-01-11T21:44:28.5282990Z OK 2023-01-11T21:44:28.5283246Z 2023-01-11T21:44:28.5283481Z Generating XML reports... 2023-01-11T21:44:28.5284717Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212027.xml 2023-01-11T21:44:28.5286242Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5287116Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5288319Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5289282Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5289753Z 2023-01-11T21:44:28.5289970Z Running tests... 2023-01-11T21:44:28.5290755Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5291814Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5292959Z test_DistributedDataParallel_with_amp_and_grad_is_view (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5295458Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77294 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.635s) 2023-01-11T21:44:28.5296516Z 2023-01-11T21:44:28.5297400Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5298044Z Ran 1 test in 1.635s 2023-01-11T21:44:28.5298351Z 2023-01-11T21:44:28.5298540Z OK (skipped=1) 2023-01-11T21:44:28.5298831Z 2023-01-11T21:44:28.5299055Z Generating XML reports... 2023-01-11T21:44:28.5300246Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212033.xml 2023-01-11T21:44:28.5301707Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5302632Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5303954Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5304938Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5305417Z 2023-01-11T21:44:28.5305630Z Running tests... 2023-01-11T21:44:28.5306439Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5307521Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5308575Z test_DistributedSampler_padding (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5309603Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 12211 2023-01-11T21:44:28.5310499Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 12212 2023-01-11T21:44:28.5311764Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5312678Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5313841Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5314791Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5315944Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5316838Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5317991Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5318945Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5319861Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5320858Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5322197Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5323623Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5324699Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5325597Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5326641Z [1673472043.336753] [7c5487d9c02b:12212:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5327613Z [1673472043.350161] [7c5487d9c02b:12212:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5328814Z [1673472043.350161] [7c5487d9c02b:12212:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5329813Z [1673472043.330495] [7c5487d9c02b:12211:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5330811Z [1673472043.344422] [7c5487d9c02b:12211:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5331774Z [1673472043.344422] [7c5487d9c02b:12211:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5332442Z ok (6.114s) 2023-01-11T21:44:28.5332744Z 2023-01-11T21:44:28.5333286Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5333922Z Ran 1 test in 6.114s 2023-01-11T21:44:28.5334236Z 2023-01-11T21:44:28.5334417Z OK 2023-01-11T21:44:28.5334667Z 2023-01-11T21:44:28.5334900Z Generating XML reports... 2023-01-11T21:44:28.5336201Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212038.xml 2023-01-11T21:44:28.5338000Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5338920Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5340054Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5340986Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5341438Z 2023-01-11T21:44:28.5341646Z Running tests... 2023-01-11T21:44:28.5342442Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5343494Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5344526Z test_SyncBatchNorm_process_group (__main__.TestDistBackendWithSpawn) ... skip: no torchvision (0.002s) 2023-01-11T21:44:28.5345096Z 2023-01-11T21:44:28.5345645Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5346386Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5346703Z 2023-01-11T21:44:28.5346919Z OK (skipped=1) 2023-01-11T21:44:28.5347210Z 2023-01-11T21:44:28.5347455Z Generating XML reports... 2023-01-11T21:44:28.5348668Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212046.xml 2023-01-11T21:44:28.5350096Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5351005Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5352208Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5353161Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5353597Z 2023-01-11T21:44:28.5353807Z Running tests... 2023-01-11T21:44:28.5354704Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5355776Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5356641Z test_accumulate_gradients_no_sync (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.5357631Z Runs _test_accumulate_gradients_no_sync using default inputs ... skip: get_future is only supported on mpi, nccl and gloo (0.002s) 2023-01-11T21:44:28.5358255Z 2023-01-11T21:44:28.5358786Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5359438Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5359722Z 2023-01-11T21:44:28.5359916Z OK (skipped=1) 2023-01-11T21:44:28.5360198Z 2023-01-11T21:44:28.5360430Z Generating XML reports... 2023-01-11T21:44:28.5361881Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212049.xml 2023-01-11T21:44:28.5363320Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5364238Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5365387Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5366333Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5366762Z 2023-01-11T21:44:28.5366973Z Running tests... 2023-01-11T21:44:28.5367781Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5368843Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5369779Z test_accumulate_gradients_no_sync_allreduce_hook (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.5370826Z Runs multiple iterations on _test_accumulate_gradients_no_sync ... skip: get_future is only supported on mpi, nccl and gloo (0.002s) 2023-01-11T21:44:28.5371433Z 2023-01-11T21:44:28.5372165Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5372855Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5373157Z 2023-01-11T21:44:28.5373341Z OK (skipped=1) 2023-01-11T21:44:28.5373640Z 2023-01-11T21:44:28.5373878Z Generating XML reports... 2023-01-11T21:44:28.5375109Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212051.xml 2023-01-11T21:44:28.5376867Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5377773Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5378942Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5379912Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5380359Z 2023-01-11T21:44:28.5380544Z Running tests... 2023-01-11T21:44:28.5381325Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5382400Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5383383Z test_accumulate_gradients_no_sync_allreduce_with_then_hook (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.5384473Z Runs multiple iterations on _test_accumulate_gradients_no_sync using allreduce ... skip: get_future is only supported on mpi, nccl and gloo (0.002s) 2023-01-11T21:44:28.5385152Z 2023-01-11T21:44:28.5385693Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5386333Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5386645Z 2023-01-11T21:44:28.5386833Z OK (skipped=1) 2023-01-11T21:44:28.5387142Z 2023-01-11T21:44:28.5387377Z Generating XML reports... 2023-01-11T21:44:28.5388602Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212053.xml 2023-01-11T21:44:28.5390080Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5390970Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5392157Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5393095Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5393529Z 2023-01-11T21:44:28.5393720Z Running tests... 2023-01-11T21:44:28.5394477Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5395538Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5396693Z test_accumulate_gradients_no_sync_grad_is_view (__main__.TestDistBackendWithSpawn) 2023-01-11T21:44:28.5397715Z Runs _test_accumulate_gradients_no_sync using default inputs ... skip: get_future is only supported on mpi, nccl and gloo (0.002s) 2023-01-11T21:44:28.5398310Z 2023-01-11T21:44:28.5398847Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5399474Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5399794Z 2023-01-11T21:44:28.5399999Z OK (skipped=1) 2023-01-11T21:44:28.5400289Z 2023-01-11T21:44:28.5400538Z Generating XML reports... 2023-01-11T21:44:28.5401780Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212056.xml 2023-01-11T21:44:28.5403238Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5404138Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5405315Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5406269Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5406726Z 2023-01-11T21:44:28.5407120Z Running tests... 2023-01-11T21:44:28.5407935Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5409002Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5409987Z test_all_gather (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5410936Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 12490 2023-01-11T21:44:28.5411846Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 12491 2023-01-11T21:44:28.5413097Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5413970Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5415158Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5416143Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5417653Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5418499Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5419660Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5420631Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5421542Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5422517Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5423852Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5425246Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5426317Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5427508Z STAGE:2023-01-11 21:21:02 12490:12490 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5428460Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5429643Z STAGE:2023-01-11 21:21:02 12491:12491 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5430673Z [1673472062.627147] [7c5487d9c02b:12490:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5431851Z [1673472064.241340] [7c5487d9c02b:12490:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5432782Z [1673472064.241340] [7c5487d9c02b:12490:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5433761Z [1673472062.628745] [7c5487d9c02b:12491:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5434696Z [1673472064.249401] [7c5487d9c02b:12491:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5435606Z [1673472064.249401] [7c5487d9c02b:12491:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5437210Z STAGE:2023-01-11 21:21:04 12490:12490 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:21:04 12491:12491 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5437994Z 2023-01-11T21:44:28.5438728Z STAGE:2023-01-11 21:21:04 12491:12491 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5440140Z STAGE:2023-01-11 21:21:04 12490:12490 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5441314Z STAGE:2023-01-11 21:21:04 12490:12490 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5442474Z STAGE:2023-01-11 21:21:04 12491:12491 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5443633Z STAGE:2023-01-11 21:21:04 12490:12490 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5444796Z STAGE:2023-01-11 21:21:04 12491:12491 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5445984Z STAGE:2023-01-11 21:21:04 12490:12490 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5447234Z STAGE:2023-01-11 21:21:04 12491:12491 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5447962Z ok (6.561s) 2023-01-11T21:44:28.5448262Z 2023-01-11T21:44:28.5448794Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5449458Z Ran 1 test in 6.561s 2023-01-11T21:44:28.5449790Z 2023-01-11T21:44:28.5449976Z OK 2023-01-11T21:44:28.5450236Z 2023-01-11T21:44:28.5450496Z Generating XML reports... 2023-01-11T21:44:28.5451691Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212058.xml 2023-01-11T21:44:28.5453168Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5454137Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5455254Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5456164Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5456938Z 2023-01-11T21:44:28.5457178Z Running tests... 2023-01-11T21:44:28.5457912Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5458843Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5459842Z test_all_gather_coalesced_complex (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support all_gather_coalesced (0.002s) 2023-01-11T21:44:28.5460441Z 2023-01-11T21:44:28.5460948Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5461537Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5461826Z 2023-01-11T21:44:28.5462013Z OK (skipped=1) 2023-01-11T21:44:28.5462285Z 2023-01-11T21:44:28.5462501Z Generating XML reports... 2023-01-11T21:44:28.5463653Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212107.xml 2023-01-11T21:44:28.5465195Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5466085Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5467198Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5468068Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5468484Z 2023-01-11T21:44:28.5468682Z Running tests... 2023-01-11T21:44:28.5469430Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5470414Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5471426Z test_all_gather_coalesced_full_group (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support all_gather_coalesced (0.002s) 2023-01-11T21:44:28.5472048Z 2023-01-11T21:44:28.5472548Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5473142Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5473428Z 2023-01-11T21:44:28.5473620Z OK (skipped=1) 2023-01-11T21:44:28.5473881Z 2023-01-11T21:44:28.5474219Z Generating XML reports... 2023-01-11T21:44:28.5475389Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212110.xml 2023-01-11T21:44:28.5476761Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5477587Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5478696Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5479591Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5480024Z 2023-01-11T21:44:28.5480223Z Running tests... 2023-01-11T21:44:28.5480971Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5481971Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5482996Z test_all_gather_coalesced_group (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support all_gather_coalesced (0.002s) 2023-01-11T21:44:28.5483603Z 2023-01-11T21:44:28.5484113Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5484691Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5484981Z 2023-01-11T21:44:28.5485170Z OK (skipped=1) 2023-01-11T21:44:28.5485447Z 2023-01-11T21:44:28.5485669Z Generating XML reports... 2023-01-11T21:44:28.5486806Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212112.xml 2023-01-11T21:44:28.5488191Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5489086Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5490223Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5491143Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5491602Z 2023-01-11T21:44:28.5491820Z Running tests... 2023-01-11T21:44:28.5492644Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5493673Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5494729Z test_all_gather_coalesced_simple (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support all_gather_coalesced (0.002s) 2023-01-11T21:44:28.5495351Z 2023-01-11T21:44:28.5495862Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5496485Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5497109Z 2023-01-11T21:44:28.5497340Z OK (skipped=1) 2023-01-11T21:44:28.5497821Z 2023-01-11T21:44:28.5498052Z Generating XML reports... 2023-01-11T21:44:28.5499245Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212114.xml 2023-01-11T21:44:28.5500644Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5501014Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5501747Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5502085Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5502112Z 2023-01-11T21:44:28.5502300Z Running tests... 2023-01-11T21:44:28.5502800Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5503384Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5503924Z test_all_gather_coalesced_with_empty (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support all_gather_coalesced (0.002s) 2023-01-11T21:44:28.5503953Z 2023-01-11T21:44:28.5504541Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5504720Z Ran 1 test in 0.003s 2023-01-11T21:44:28.5504751Z 2023-01-11T21:44:28.5504932Z OK (skipped=1) 2023-01-11T21:44:28.5504959Z 2023-01-11T21:44:28.5505164Z Generating XML reports... 2023-01-11T21:44:28.5505984Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212117.xml 2023-01-11T21:44:28.5506664Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5506978Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5507684Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5508026Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5508053Z 2023-01-11T21:44:28.5508233Z Running tests... 2023-01-11T21:44:28.5508682Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5509258Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5509726Z test_all_gather_complex (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5510126Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 12769 2023-01-11T21:44:28.5510515Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 12770 2023-01-11T21:44:28.5511221Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5511531Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5512249Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5512560Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5513265Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5513576Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5514289Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5514630Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5515067Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5515498Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5516271Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5517124Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5517529Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5518150Z STAGE:2023-01-11 21:21:23 12769:12769 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5518566Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5519188Z STAGE:2023-01-11 21:21:23 12770:12770 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5519687Z [1673472083.565263] [7c5487d9c02b:12769:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5520102Z [1673472085.210049] [7c5487d9c02b:12769:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5520613Z [1673472085.210049] [7c5487d9c02b:12769:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5521120Z [1673472083.566873] [7c5487d9c02b:12770:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5521544Z [1673472085.235407] [7c5487d9c02b:12770:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5521981Z [1673472085.235407] [7c5487d9c02b:12770:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5523017Z STAGE:2023-01-11 21:21:25 12769:12769 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:21:25 12770:12770 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5523072Z 2023-01-11T21:44:28.5523729Z STAGE:2023-01-11 21:21:25 12770:12770 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5524402Z STAGE:2023-01-11 21:21:25 12769:12769 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5525023Z STAGE:2023-01-11 21:21:25 12770:12770 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5525640Z STAGE:2023-01-11 21:21:25 12769:12769 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5526270Z STAGE:2023-01-11 21:21:25 12770:12770 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5526893Z STAGE:2023-01-11 21:21:25 12769:12769 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5527557Z STAGE:2023-01-11 21:21:25 12770:12770 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5528218Z STAGE:2023-01-11 21:21:25 12769:12769 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5528385Z ok (6.648s) 2023-01-11T21:44:28.5528437Z 2023-01-11T21:44:28.5528908Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5529114Z Ran 1 test in 6.648s 2023-01-11T21:44:28.5529140Z 2023-01-11T21:44:28.5529298Z OK 2023-01-11T21:44:28.5529324Z 2023-01-11T21:44:28.5529538Z Generating XML reports... 2023-01-11T21:44:28.5530403Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212119.xml 2023-01-11T21:44:28.5531121Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5531443Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5532170Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5532499Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5532614Z 2023-01-11T21:44:28.5532818Z Running tests... 2023-01-11T21:44:28.5533309Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5533912Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5534399Z test_all_gather_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all gather (0.002s) 2023-01-11T21:44:28.5534426Z 2023-01-11T21:44:28.5534916Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5535116Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5535143Z 2023-01-11T21:44:28.5535330Z OK (skipped=1) 2023-01-11T21:44:28.5535356Z 2023-01-11T21:44:28.5535575Z Generating XML reports... 2023-01-11T21:44:28.5536425Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212128.xml 2023-01-11T21:44:28.5537425Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5537758Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5538634Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5539006Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5539036Z 2023-01-11T21:44:28.5539226Z Running tests... 2023-01-11T21:44:28.5539726Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5540324Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5540818Z test_all_gather_cuda_complex (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all gather (0.002s) 2023-01-11T21:44:28.5540876Z 2023-01-11T21:44:28.5541347Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5541545Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5541580Z 2023-01-11T21:44:28.5541771Z OK (skipped=1) 2023-01-11T21:44:28.5541797Z 2023-01-11T21:44:28.5542018Z Generating XML reports... 2023-01-11T21:44:28.5542890Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212131.xml 2023-01-11T21:44:28.5543632Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5543955Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5544696Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5545026Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5545078Z 2023-01-11T21:44:28.5545252Z Running tests... 2023-01-11T21:44:28.5545756Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5546362Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5546855Z test_all_gather_full_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5547272Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 12949 2023-01-11T21:44:28.5547688Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 12950 2023-01-11T21:44:28.5548427Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5548733Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5549483Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5549841Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5550571Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5551032Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5551791Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5552152Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5552620Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5553091Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5553855Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5554704Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5555149Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5555622Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:44:28.5556142Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5556614Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:44:28.5557406Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.5558043Z STAGE:2023-01-11 21:21:37 12950:12950 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5558817Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.5559434Z STAGE:2023-01-11 21:21:37 12949:12949 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5559965Z [1673472097.531877] [7c5487d9c02b:12950:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5560413Z [1673472099.180616] [7c5487d9c02b:12950:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5560862Z [1673472099.180616] [7c5487d9c02b:12950:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5561519Z STAGE:2023-01-11 21:21:39 12950:12950 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5562026Z [1673472097.511268] [7c5487d9c02b:12949:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5562438Z [1673472099.175195] [7c5487d9c02b:12949:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5562860Z [1673472099.175195] [7c5487d9c02b:12949:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5563508Z STAGE:2023-01-11 21:21:39 12949:12949 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5564172Z STAGE:2023-01-11 21:21:39 12950:12950 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5564809Z STAGE:2023-01-11 21:21:39 12949:12949 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5565429Z STAGE:2023-01-11 21:21:39 12950:12950 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5566042Z STAGE:2023-01-11 21:21:39 12949:12949 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5566681Z STAGE:2023-01-11 21:21:39 12950:12950 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5567758Z STAGE:2023-01-11 21:21:39 12949:12949 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:21:39 12950:12950 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5567878Z 2023-01-11T21:44:28.5568576Z STAGE:2023-01-11 21:21:39 12949:12949 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5568754Z ok (6.644s) 2023-01-11T21:44:28.5568781Z 2023-01-11T21:44:28.5569276Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5569468Z Ran 1 test in 6.645s 2023-01-11T21:44:28.5569495Z 2023-01-11T21:44:28.5569653Z OK 2023-01-11T21:44:28.5569679Z 2023-01-11T21:44:28.5569877Z Generating XML reports... 2023-01-11T21:44:28.5570756Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212133.xml 2023-01-11T21:44:28.5571475Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5571808Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5572555Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5572986Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5573016Z 2023-01-11T21:44:28.5573216Z Running tests... 2023-01-11T21:44:28.5573724Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5574294Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5574778Z test_all_gather_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5575192Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 13063 2023-01-11T21:44:28.5575612Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 13064 2023-01-11T21:44:28.5576337Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5576932Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5577690Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5578057Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5578785Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5579085Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5579825Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5580185Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5580635Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5581104Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5581891Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5582680Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5583117Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5583525Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5583822Z skip: Skipped due to small world size. (4.245s) 2023-01-11T21:44:28.5583855Z 2023-01-11T21:44:28.5584353Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5584559Z Ran 1 test in 4.245s 2023-01-11T21:44:28.5584587Z 2023-01-11T21:44:28.5584778Z OK (skipped=1) 2023-01-11T21:44:28.5584804Z 2023-01-11T21:44:28.5585170Z Generating XML reports... 2023-01-11T21:44:28.5586052Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212142.xml 2023-01-11T21:44:28.5586781Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5587099Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5587806Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5588152Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5588181Z 2023-01-11T21:44:28.5588372Z Running tests... 2023-01-11T21:44:28.5588856Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5589448Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5590006Z test_all_gather_into_cat_tensor_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_gather_into_tensor (0.002s) 2023-01-11T21:44:28.5590043Z 2023-01-11T21:44:28.5590535Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5590859Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5590888Z 2023-01-11T21:44:28.5591087Z OK (skipped=1) 2023-01-11T21:44:28.5591113Z 2023-01-11T21:44:28.5591307Z Generating XML reports... 2023-01-11T21:44:28.5592171Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212149.xml 2023-01-11T21:44:28.5592890Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5593212Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5593948Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5594305Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5594332Z 2023-01-11T21:44:28.5594523Z Running tests... 2023-01-11T21:44:28.5595021Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5595599Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5596172Z test_all_gather_into_stack_tensor_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_gather_into_tensor (0.002s) 2023-01-11T21:44:28.5596200Z 2023-01-11T21:44:28.5596689Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5596892Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5596920Z 2023-01-11T21:44:28.5597107Z OK (skipped=1) 2023-01-11T21:44:28.5597134Z 2023-01-11T21:44:28.5597354Z Generating XML reports... 2023-01-11T21:44:28.5598212Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212151.xml 2023-01-11T21:44:28.5598946Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5599275Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5599993Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5600351Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5600379Z 2023-01-11T21:44:28.5600575Z Running tests... 2023-01-11T21:44:28.5601075Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5601682Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5602222Z test_all_gather_multigpu (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl backend supports allgather multigpu (0.002s) 2023-01-11T21:44:28.5602250Z 2023-01-11T21:44:28.5602742Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5603038Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5603065Z 2023-01-11T21:44:28.5603255Z OK (skipped=1) 2023-01-11T21:44:28.5603281Z 2023-01-11T21:44:28.5603487Z Generating XML reports... 2023-01-11T21:44:28.5604370Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212154.xml 2023-01-11T21:44:28.5605099Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5605428Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5606177Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5606535Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5606564Z 2023-01-11T21:44:28.5606762Z Running tests... 2023-01-11T21:44:28.5607262Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5607876Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5608511Z test_all_gather_multigpu_complex (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl backend supports allgather multigpu (0.002s) 2023-01-11T21:44:28.5608543Z 2023-01-11T21:44:28.5609056Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5609266Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5609293Z 2023-01-11T21:44:28.5609480Z OK (skipped=1) 2023-01-11T21:44:28.5609508Z 2023-01-11T21:44:28.5609733Z Generating XML reports... 2023-01-11T21:44:28.5610612Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212156.xml 2023-01-11T21:44:28.5611343Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5611670Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5612438Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5612788Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5612815Z 2023-01-11T21:44:28.5613010Z Running tests... 2023-01-11T21:44:28.5613514Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5614125Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5614651Z test_all_gather_object_default_pg (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5615071Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 13298 2023-01-11T21:44:28.5615487Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 13299 2023-01-11T21:44:28.5616223Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5616836Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5617631Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5617999Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5618733Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5619062Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5619817Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5620176Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5620647Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5621256Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5622044Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5622840Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5623286Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5623726Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5624251Z [1673472123.031573] [7c5487d9c02b:13299:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5624687Z [1673472124.460778] [7c5487d9c02b:13299:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5625142Z [1673472124.460778] [7c5487d9c02b:13299:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5625761Z [1673472123.029352] [7c5487d9c02b:13298:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5626212Z [1673472124.446320] [7c5487d9c02b:13298:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5626655Z [1673472124.446320] [7c5487d9c02b:13298:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5626818Z ok (7.246s) 2023-01-11T21:44:28.5626846Z 2023-01-11T21:44:28.5627358Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5627562Z Ran 1 test in 7.246s 2023-01-11T21:44:28.5627589Z 2023-01-11T21:44:28.5627753Z OK 2023-01-11T21:44:28.5627781Z 2023-01-11T21:44:28.5628017Z Generating XML reports... 2023-01-11T21:44:28.5628908Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212159.xml 2023-01-11T21:44:28.5629664Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5629999Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5630741Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5631112Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5631139Z 2023-01-11T21:44:28.5631335Z Running tests... 2023-01-11T21:44:28.5631841Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5632463Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5632987Z test_all_gather_object_subgroup (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5633421Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 13409 2023-01-11T21:44:28.5633851Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 13410 2023-01-11T21:44:28.5634600Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5634910Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5635675Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5636043Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5636785Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5637119Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5637981Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5638349Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5638808Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5639253Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5640051Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5640850Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5641300Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5641747Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5642224Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:44:28.5642771Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:44:28.5643592Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.5644386Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.5644830Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T21:44:28.5645294Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T21:44:28.5646177Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:44:28.5646989Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:44:28.5647463Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T21:44:28.5647927Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T21:44:28.5648729Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T21:44:28.5649512Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T21:44:28.5650051Z [1673472132.901324] [7c5487d9c02b:13409:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5650490Z [1673472134.325634] [7c5487d9c02b:13409:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5650905Z [1673472134.325634] [7c5487d9c02b:13409:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5651410Z [1673472132.920962] [7c5487d9c02b:13410:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5651823Z [1673472134.312853] [7c5487d9c02b:13410:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5652243Z [1673472134.312853] [7c5487d9c02b:13410:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5652418Z ok (7.651s) 2023-01-11T21:44:28.5652446Z 2023-01-11T21:44:28.5652939Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5653130Z Ran 1 test in 7.651s 2023-01-11T21:44:28.5653157Z 2023-01-11T21:44:28.5653310Z OK 2023-01-11T21:44:28.5653337Z 2023-01-11T21:44:28.5653672Z Generating XML reports... 2023-01-11T21:44:28.5654572Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212208.xml 2023-01-11T21:44:28.5655298Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5655617Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5656348Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5657466Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5657502Z 2023-01-11T21:44:28.5657697Z Running tests... 2023-01-11T21:44:28.5658029Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5658352Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5658620Z test_all_gather_v_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports all_gather_v (0.002s) 2023-01-11T21:44:28.5658649Z 2023-01-11T21:44:28.5658895Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5659190Z Ran 1 test in 0.003s 2023-01-11T21:44:28.5659214Z 2023-01-11T21:44:28.5659340Z OK (skipped=1) 2023-01-11T21:44:28.5659359Z 2023-01-11T21:44:28.5659485Z Generating XML reports... 2023-01-11T21:44:28.5659948Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212219.xml 2023-01-11T21:44:28.5660327Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5660506Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5660886Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5661076Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5661102Z 2023-01-11T21:44:28.5661195Z Running tests... 2023-01-11T21:44:28.5661456Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5661764Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5662182Z test_all_reduce_coalesced_full_group_max (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:44:28.5662202Z 2023-01-11T21:44:28.5662458Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5662569Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5662589Z 2023-01-11T21:44:28.5662695Z OK (skipped=1) 2023-01-11T21:44:28.5662714Z 2023-01-11T21:44:28.5662835Z Generating XML reports... 2023-01-11T21:44:28.5663259Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212221.xml 2023-01-11T21:44:28.5663630Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5663804Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5664184Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5664371Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5664391Z 2023-01-11T21:44:28.5664500Z Running tests... 2023-01-11T21:44:28.5664760Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5665067Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5665478Z test_all_reduce_coalesced_full_group_min (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:44:28.5665498Z 2023-01-11T21:44:28.5665750Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5665939Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5665960Z 2023-01-11T21:44:28.5666068Z OK (skipped=1) 2023-01-11T21:44:28.5666087Z 2023-01-11T21:44:28.5666219Z Generating XML reports... 2023-01-11T21:44:28.5666663Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212223.xml 2023-01-11T21:44:28.5667027Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5667201Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5667577Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5667767Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5667786Z 2023-01-11T21:44:28.5667876Z Running tests... 2023-01-11T21:44:28.5668133Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5668441Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5668912Z test_all_reduce_coalesced_full_group_product (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:44:28.5668934Z 2023-01-11T21:44:28.5669202Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5669315Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5669335Z 2023-01-11T21:44:28.5669442Z OK (skipped=1) 2023-01-11T21:44:28.5669465Z 2023-01-11T21:44:28.5669589Z Generating XML reports... 2023-01-11T21:44:28.5670028Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212226.xml 2023-01-11T21:44:28.5670375Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5670550Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5670925Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5671115Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5671134Z 2023-01-11T21:44:28.5671242Z Running tests... 2023-01-11T21:44:28.5671501Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5671807Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5672218Z test_all_reduce_coalesced_full_group_sum (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:44:28.5672238Z 2023-01-11T21:44:28.5672492Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5672587Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5672607Z 2023-01-11T21:44:28.5672713Z OK (skipped=1) 2023-01-11T21:44:28.5672736Z 2023-01-11T21:44:28.5672857Z Generating XML reports... 2023-01-11T21:44:28.5673296Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212228.xml 2023-01-11T21:44:28.5673661Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5673835Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5674207Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5674396Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5674415Z 2023-01-11T21:44:28.5674521Z Running tests... 2023-01-11T21:44:28.5674762Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5675065Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5675532Z test_all_reduce_coalesced_group_max (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:44:28.5675553Z 2023-01-11T21:44:28.5675815Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5675925Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5675944Z 2023-01-11T21:44:28.5676050Z OK (skipped=1) 2023-01-11T21:44:28.5676069Z 2023-01-11T21:44:28.5676190Z Generating XML reports... 2023-01-11T21:44:28.5676624Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212231.xml 2023-01-11T21:44:28.5676968Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5677142Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5677515Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5677707Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5677726Z 2023-01-11T21:44:28.5677832Z Running tests... 2023-01-11T21:44:28.5678139Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5678458Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5678865Z test_all_reduce_coalesced_group_min (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:44:28.5678885Z 2023-01-11T21:44:28.5679146Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5679241Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5679260Z 2023-01-11T21:44:28.5679368Z OK (skipped=1) 2023-01-11T21:44:28.5679387Z 2023-01-11T21:44:28.5679507Z Generating XML reports... 2023-01-11T21:44:28.5679941Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212233.xml 2023-01-11T21:44:28.5680308Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5680484Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5680857Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5681046Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5681066Z 2023-01-11T21:44:28.5681171Z Running tests... 2023-01-11T21:44:28.5681412Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5681716Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5682125Z test_all_reduce_coalesced_group_product (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:44:28.5682149Z 2023-01-11T21:44:28.5682404Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5682514Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5682533Z 2023-01-11T21:44:28.5682641Z OK (skipped=1) 2023-01-11T21:44:28.5682661Z 2023-01-11T21:44:28.5682782Z Generating XML reports... 2023-01-11T21:44:28.5683217Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212235.xml 2023-01-11T21:44:28.5683581Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5683738Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5684110Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5684298Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5684317Z 2023-01-11T21:44:28.5684484Z Running tests... 2023-01-11T21:44:28.5684748Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5685061Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5685467Z test_all_reduce_coalesced_group_sum (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:44:28.5685488Z 2023-01-11T21:44:28.5685748Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5685842Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5685877Z 2023-01-11T21:44:28.5685965Z OK (skipped=1) 2023-01-11T21:44:28.5685984Z 2023-01-11T21:44:28.5686104Z Generating XML reports... 2023-01-11T21:44:28.5686537Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212238.xml 2023-01-11T21:44:28.5686900Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5687075Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5687511Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5687710Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5687729Z 2023-01-11T21:44:28.5687838Z Running tests... 2023-01-11T21:44:28.5688084Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5688388Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5688782Z test_all_reduce_coalesced_max (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:44:28.5688802Z 2023-01-11T21:44:28.5689062Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5689173Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5689197Z 2023-01-11T21:44:28.5689306Z OK (skipped=1) 2023-01-11T21:44:28.5689325Z 2023-01-11T21:44:28.5689452Z Generating XML reports... 2023-01-11T21:44:28.5689888Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212240.xml 2023-01-11T21:44:28.5690248Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5690405Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5690776Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5690962Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5690982Z 2023-01-11T21:44:28.5691087Z Running tests... 2023-01-11T21:44:28.5691343Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5691648Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5691946Z test_all_reduce_coalesced_max_complex_unsupported (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5692165Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 13871 2023-01-11T21:44:28.5692364Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 13872 2023-01-11T21:44:28.5692728Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5692901Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5693272Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5693460Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5693818Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5694051Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5694428Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5694616Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5694841Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5695080Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5695475Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5695867Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5696095Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5697261Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1714: UserWarning: torch.distributed.all_reduce_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T21:44:28.5697400Z warnings.warn( 2023-01-11T21:44:28.5697638Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5698381Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1714: UserWarning: torch.distributed.all_reduce_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T21:44:28.5698493Z warnings.warn( 2023-01-11T21:44:28.5698575Z ok (4.253s) 2023-01-11T21:44:28.5698596Z 2023-01-11T21:44:28.5698860Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5698977Z Ran 1 test in 4.253s 2023-01-11T21:44:28.5698997Z 2023-01-11T21:44:28.5699086Z OK 2023-01-11T21:44:28.5699105Z 2023-01-11T21:44:28.5699227Z Generating XML reports... 2023-01-11T21:44:28.5699668Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212243.xml 2023-01-11T21:44:28.5700036Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5700210Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5700563Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5700751Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5700770Z 2023-01-11T21:44:28.5700876Z Running tests... 2023-01-11T21:44:28.5701136Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5701443Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5701835Z test_all_reduce_coalesced_min (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:44:28.5701856Z 2023-01-11T21:44:28.5702113Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5702222Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5702242Z 2023-01-11T21:44:28.5702348Z OK (skipped=1) 2023-01-11T21:44:28.5702367Z 2023-01-11T21:44:28.5702472Z Generating XML reports... 2023-01-11T21:44:28.5702906Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212249.xml 2023-01-11T21:44:28.5703269Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5703445Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5703905Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5704097Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5704117Z 2023-01-11T21:44:28.5704226Z Running tests... 2023-01-11T21:44:28.5704484Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5704790Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5705175Z test_all_reduce_coalesced_product (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:44:28.5705194Z 2023-01-11T21:44:28.5705449Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5705560Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5705579Z 2023-01-11T21:44:28.5705686Z OK (skipped=1) 2023-01-11T21:44:28.5705705Z 2023-01-11T21:44:28.5705832Z Generating XML reports... 2023-01-11T21:44:28.5706272Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212252.xml 2023-01-11T21:44:28.5706682Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5706863Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5707240Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5707410Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5707429Z 2023-01-11T21:44:28.5707537Z Running tests... 2023-01-11T21:44:28.5707793Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5708099Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5708492Z test_all_reduce_coalesced_sum (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:44:28.5708516Z 2023-01-11T21:44:28.5708776Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5708886Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5708905Z 2023-01-11T21:44:28.5709011Z OK (skipped=1) 2023-01-11T21:44:28.5709030Z 2023-01-11T21:44:28.5709135Z Generating XML reports... 2023-01-11T21:44:28.5709569Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212254.xml 2023-01-11T21:44:28.5709931Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5710102Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5710471Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5710661Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5710681Z 2023-01-11T21:44:28.5710787Z Running tests... 2023-01-11T21:44:28.5711045Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5711350Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5711612Z test_all_reduce_complex_unsupported_ops (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5711832Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14073 2023-01-11T21:44:28.5712047Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14074 2023-01-11T21:44:28.5712411Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5712582Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5712954Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5713199Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5713567Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5713721Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5714088Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5714276Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5714519Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5714759Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5715151Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5715546Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5715817Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5716049Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5716132Z ok (4.290s) 2023-01-11T21:44:28.5716170Z 2023-01-11T21:44:28.5716417Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5716528Z Ran 1 test in 4.290s 2023-01-11T21:44:28.5716547Z 2023-01-11T21:44:28.5716638Z OK 2023-01-11T21:44:28.5716657Z 2023-01-11T21:44:28.5716779Z Generating XML reports... 2023-01-11T21:44:28.5717217Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212257.xml 2023-01-11T21:44:28.5717588Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5717762Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5718136Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5718307Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5718326Z 2023-01-11T21:44:28.5718434Z Running tests... 2023-01-11T21:44:28.5718692Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5718997Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5719261Z test_all_reduce_full_group_max (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5719476Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14176 2023-01-11T21:44:28.5719696Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14177 2023-01-11T21:44:28.5720061Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5720220Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5720595Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5720783Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5721142Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5721313Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5721681Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5721868Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5722186Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5722433Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5722816Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5723208Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5723431Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5723672Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:44:28.5723894Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5724129Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:44:28.5724523Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.5724958Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.5725306Z STAGE:2023-01-11 21:23:07 14176:14176 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5725608Z STAGE:2023-01-11 21:23:07 14177:14177 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5725884Z [1673472187.941532] [7c5487d9c02b:14176:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5726116Z [1673472189.617398] [7c5487d9c02b:14176:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5726355Z [1673472189.617398] [7c5487d9c02b:14176:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5726631Z [1673472187.942219] [7c5487d9c02b:14177:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5726856Z [1673472189.588375] [7c5487d9c02b:14177:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5727088Z [1673472189.588375] [7c5487d9c02b:14177:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5727637Z STAGE:2023-01-11 21:23:09 14176:14176 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:23:09 14177:14177 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5727657Z 2023-01-11T21:44:28.5728004Z STAGE:2023-01-11 21:23:09 14177:14177 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5728348Z STAGE:2023-01-11 21:23:09 14176:14176 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5728676Z STAGE:2023-01-11 21:23:10 14176:14176 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5728977Z STAGE:2023-01-11 21:23:10 14177:14177 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5729304Z STAGE:2023-01-11 21:23:10 14176:14176 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5729625Z STAGE:2023-01-11 21:23:10 14177:14177 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5729963Z STAGE:2023-01-11 21:23:10 14176:14176 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5730300Z STAGE:2023-01-11 21:23:10 14177:14177 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5730402Z ok (6.659s) 2023-01-11T21:44:28.5730421Z 2023-01-11T21:44:28.5730740Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5730854Z Ran 1 test in 6.659s 2023-01-11T21:44:28.5730873Z 2023-01-11T21:44:28.5730947Z OK 2023-01-11T21:44:28.5730985Z 2023-01-11T21:44:28.5731094Z Generating XML reports... 2023-01-11T21:44:28.5731541Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212303.xml 2023-01-11T21:44:28.5731903Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5732075Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5732447Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5732635Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5732655Z 2023-01-11T21:44:28.5732761Z Running tests... 2023-01-11T21:44:28.5733023Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5733312Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5733624Z test_all_reduce_full_group_min (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5733847Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14290 2023-01-11T21:44:28.5734063Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14291 2023-01-11T21:44:28.5734429Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5734602Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5734972Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5735159Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5735501Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5735676Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5736045Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5736228Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5736469Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5736888Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5737301Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5737696Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5737926Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5738149Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:44:28.5738369Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5738599Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:44:28.5738993Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.5739324Z STAGE:2023-01-11 21:23:17 14291:14291 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5739711Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.5740032Z STAGE:2023-01-11 21:23:17 14290:14290 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5740402Z [1673472197.078449] [7c5487d9c02b:14291:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5740633Z [1673472198.732112] [7c5487d9c02b:14291:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5740872Z [1673472198.732112] [7c5487d9c02b:14291:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5741124Z [1673472197.076066] [7c5487d9c02b:14290:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5741351Z [1673472198.721960] [7c5487d9c02b:14290:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5741587Z [1673472198.721960] [7c5487d9c02b:14290:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5742209Z STAGE:2023-01-11 21:23:19 14291:14291 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:23:19 14290:14290 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5742231Z 2023-01-11T21:44:28.5742588Z STAGE:2023-01-11 21:23:19 14291:14291 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5742931Z STAGE:2023-01-11 21:23:19 14290:14290 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5743256Z STAGE:2023-01-11 21:23:19 14290:14290 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5743576Z STAGE:2023-01-11 21:23:19 14291:14291 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5743904Z STAGE:2023-01-11 21:23:19 14290:14290 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5744248Z STAGE:2023-01-11 21:23:19 14290:14290 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5744558Z STAGE:2023-01-11 21:23:19 14291:14291 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5744896Z STAGE:2023-01-11 21:23:19 14291:14291 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5744997Z ok (6.571s) 2023-01-11T21:44:28.5745016Z 2023-01-11T21:44:28.5745280Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5745392Z Ran 1 test in 6.571s 2023-01-11T21:44:28.5745412Z 2023-01-11T21:44:28.5745502Z OK 2023-01-11T21:44:28.5745521Z 2023-01-11T21:44:28.5745642Z Generating XML reports... 2023-01-11T21:44:28.5746084Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212313.xml 2023-01-11T21:44:28.5746449Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5746610Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5746986Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5747174Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5747193Z 2023-01-11T21:44:28.5747299Z Running tests... 2023-01-11T21:44:28.5747557Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5747870Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5748143Z test_all_reduce_full_group_product (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5748359Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14404 2023-01-11T21:44:28.5748555Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14405 2023-01-11T21:44:28.5748983Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5749158Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5749537Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5749728Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5750086Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5750256Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5750625Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5750807Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5751034Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5751274Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5751714Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5752116Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5752343Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5752578Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:44:28.5752798Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5753029Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:44:28.5753424Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.5753845Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.5754177Z STAGE:2023-01-11 21:23:26 14404:14404 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5754492Z STAGE:2023-01-11 21:23:26 14405:14405 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5754763Z [1673472206.295125] [7c5487d9c02b:14404:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5754987Z [1673472207.932033] [7c5487d9c02b:14404:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5755217Z [1673472207.932033] [7c5487d9c02b:14404:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5755486Z [1673472206.315035] [7c5487d9c02b:14405:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5755716Z [1673472207.913816] [7c5487d9c02b:14405:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5755950Z [1673472207.913816] [7c5487d9c02b:14405:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5756495Z STAGE:2023-01-11 21:23:28 14404:14404 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:23:28 14405:14405 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5756516Z 2023-01-11T21:44:28.5756842Z STAGE:2023-01-11 21:23:28 14405:14405 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5757182Z STAGE:2023-01-11 21:23:28 14404:14404 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5757771Z STAGE:2023-01-11 21:23:28 14404:14404 ActivityProfilerController.cpp:300] Completed Stage: Warm UpSTAGE:2023-01-11 21:23:28 14405:14405 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5757791Z 2023-01-11T21:44:28.5758325Z STAGE:2023-01-11 21:23:28 14405:14405 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:23:28 14404:14404 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5758346Z 2023-01-11T21:44:28.5758910Z STAGE:2023-01-11 21:23:28 14405:14405 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 21:23:28 14404:14404 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5758930Z 2023-01-11T21:44:28.5759029Z ok (6.570s) 2023-01-11T21:44:28.5759048Z 2023-01-11T21:44:28.5759307Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5759421Z Ran 1 test in 6.570s 2023-01-11T21:44:28.5759440Z 2023-01-11T21:44:28.5759531Z OK 2023-01-11T21:44:28.5759550Z 2023-01-11T21:44:28.5759672Z Generating XML reports... 2023-01-11T21:44:28.5760160Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212322.xml 2023-01-11T21:44:28.5760517Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5760692Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5761069Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5761263Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5761284Z 2023-01-11T21:44:28.5761391Z Running tests... 2023-01-11T21:44:28.5761647Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5761960Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5762227Z test_all_reduce_full_group_sum (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5762442Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14518 2023-01-11T21:44:28.5762638Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14519 2023-01-11T21:44:28.5763003Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5763175Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5763548Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5763734Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5764091Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5764264Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5764635Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5764804Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5765045Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5765288Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5765683Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5766072Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5766298Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5766620Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:44:28.5766846Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5767080Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:44:28.5767455Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.5767842Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.5768172Z STAGE:2023-01-11 21:23:35 14519:14519 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5768492Z STAGE:2023-01-11 21:23:35 14518:14518 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5768766Z [1673472215.344372] [7c5487d9c02b:14518:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5769040Z [1673472217.002422] [7c5487d9c02b:14518:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5769283Z [1673472217.002422] [7c5487d9c02b:14518:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5769550Z [1673472215.364829] [7c5487d9c02b:14519:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5769774Z [1673472216.965971] [7c5487d9c02b:14519:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5770005Z [1673472216.965971] [7c5487d9c02b:14519:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5770557Z STAGE:2023-01-11 21:23:37 14518:14518 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:23:37 14519:14519 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5770582Z 2023-01-11T21:44:28.5770912Z STAGE:2023-01-11 21:23:37 14519:14519 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5771252Z STAGE:2023-01-11 21:23:37 14518:14518 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5771571Z STAGE:2023-01-11 21:23:37 14519:14519 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5771883Z STAGE:2023-01-11 21:23:37 14518:14518 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5772209Z STAGE:2023-01-11 21:23:37 14519:14519 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5772528Z STAGE:2023-01-11 21:23:37 14518:14518 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5772868Z STAGE:2023-01-11 21:23:37 14519:14519 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5773207Z STAGE:2023-01-11 21:23:37 14518:14518 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5773306Z ok (6.499s) 2023-01-11T21:44:28.5773325Z 2023-01-11T21:44:28.5773568Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5773676Z Ran 1 test in 6.499s 2023-01-11T21:44:28.5773696Z 2023-01-11T21:44:28.5773782Z OK 2023-01-11T21:44:28.5773801Z 2023-01-11T21:44:28.5773921Z Generating XML reports... 2023-01-11T21:44:28.5774358Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212331.xml 2023-01-11T21:44:28.5774725Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5774896Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5775329Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5775521Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5775541Z 2023-01-11T21:44:28.5775632Z Running tests... 2023-01-11T21:44:28.5775890Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5776192Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5776444Z test_all_reduce_group_max (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5776845Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14632 2023-01-11T21:44:28.5777066Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14633 2023-01-11T21:44:28.5777440Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5777617Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5778048Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5778250Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5778615Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5778786Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5779157Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5779342Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5779586Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5779829Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5780224Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5780599Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5780823Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5781043Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5781198Z skip: Skipped due to small world size. (4.234s) 2023-01-11T21:44:28.5781218Z 2023-01-11T21:44:28.5781478Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5781586Z Ran 1 test in 4.234s 2023-01-11T21:44:28.5781605Z 2023-01-11T21:44:28.5781707Z OK (skipped=1) 2023-01-11T21:44:28.5781726Z 2023-01-11T21:44:28.5781848Z Generating XML reports... 2023-01-11T21:44:28.5782272Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212340.xml 2023-01-11T21:44:28.5782636Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5782806Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5783179Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5783363Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5783382Z 2023-01-11T21:44:28.5783486Z Running tests... 2023-01-11T21:44:28.5783741Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5784044Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5784298Z test_all_reduce_group_min (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5784587Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14735 2023-01-11T21:44:28.5784803Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14736 2023-01-11T21:44:28.5785169Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5785337Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5785706Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5785890Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5786243Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5786410Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5786759Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5786990Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5787242Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5787484Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5787881Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5788272Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5788494Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5788717Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5788875Z skip: Skipped due to small world size. (4.195s) 2023-01-11T21:44:28.5788895Z 2023-01-11T21:44:28.5789141Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5789250Z Ran 1 test in 4.195s 2023-01-11T21:44:28.5789270Z 2023-01-11T21:44:28.5789374Z OK (skipped=1) 2023-01-11T21:44:28.5789393Z 2023-01-11T21:44:28.5789510Z Generating XML reports... 2023-01-11T21:44:28.5789950Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212347.xml 2023-01-11T21:44:28.5790314Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5790486Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5790859Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5791051Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5791071Z 2023-01-11T21:44:28.5791160Z Running tests... 2023-01-11T21:44:28.5791424Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5791727Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5791989Z test_all_reduce_group_product (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5792203Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14838 2023-01-11T21:44:28.5792410Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14839 2023-01-11T21:44:28.5792770Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5792937Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5793290Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5793526Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5793886Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5794056Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5794419Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5794602Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5794847Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5795087Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5795477Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5795853Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5796127Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5796357Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5796515Z skip: Skipped due to small world size. (4.219s) 2023-01-11T21:44:28.5796535Z 2023-01-11T21:44:28.5796799Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5796908Z Ran 1 test in 4.219s 2023-01-11T21:44:28.5796927Z 2023-01-11T21:44:28.5797030Z OK (skipped=1) 2023-01-11T21:44:28.5797049Z 2023-01-11T21:44:28.5797170Z Generating XML reports... 2023-01-11T21:44:28.5797611Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212354.xml 2023-01-11T21:44:28.5797964Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5798142Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5798514Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5798701Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5798720Z 2023-01-11T21:44:28.5798826Z Running tests... 2023-01-11T21:44:28.5799085Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5799390Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5799644Z test_all_reduce_group_sum (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5799842Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14941 2023-01-11T21:44:28.5800060Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14942 2023-01-11T21:44:28.5800428Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5800601Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5800972Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5801157Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5801517Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5801688Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5802055Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5802278Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5802521Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5802766Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5803163Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5803553Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5803780Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5804001Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5804157Z skip: Skipped due to small world size. (4.259s) 2023-01-11T21:44:28.5804177Z 2023-01-11T21:44:28.5804438Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5804535Z Ran 1 test in 4.259s 2023-01-11T21:44:28.5804555Z 2023-01-11T21:44:28.5804661Z OK (skipped=1) 2023-01-11T21:44:28.5804680Z 2023-01-11T21:44:28.5804852Z Generating XML reports... 2023-01-11T21:44:28.5805308Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212400.xml 2023-01-11T21:44:28.5805674Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5805846Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5806217Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5806404Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5806424Z 2023-01-11T21:44:28.5806514Z Running tests... 2023-01-11T21:44:28.5806781Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5807082Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5807330Z test_all_reduce_max (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5807544Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15044 2023-01-11T21:44:28.5807753Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15045 2023-01-11T21:44:28.5808118Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5808293Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5808664Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5808835Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5809195Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5809360Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5809729Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5809917Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5810155Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5810396Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5810790Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5811164Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5811445Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5811785Z STAGE:2023-01-11 21:24:11 15045:15045 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5812009Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5812338Z STAGE:2023-01-11 21:24:11 15044:15044 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5812609Z [1673472251.573797] [7c5487d9c02b:15044:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5812839Z [1673472253.197921] [7c5487d9c02b:15044:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5813073Z [1673472253.197921] [7c5487d9c02b:15044:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5813346Z [1673472251.574350] [7c5487d9c02b:15045:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5813619Z [1673472253.218158] [7c5487d9c02b:15045:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5813842Z [1673472253.218158] [7c5487d9c02b:15045:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5814394Z STAGE:2023-01-11 21:24:13 15044:15044 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:24:13 15045:15045 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5814415Z 2023-01-11T21:44:28.5814760Z STAGE:2023-01-11 21:24:13 15045:15045 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5815105Z STAGE:2023-01-11 21:24:13 15044:15044 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5815431Z STAGE:2023-01-11 21:24:13 15045:15045 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5815749Z STAGE:2023-01-11 21:24:13 15044:15044 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5816076Z STAGE:2023-01-11 21:24:13 15045:15045 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5816405Z STAGE:2023-01-11 21:24:13 15044:15044 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5816981Z STAGE:2023-01-11 21:24:13 15045:15045 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5817336Z STAGE:2023-01-11 21:24:13 15044:15044 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5817420Z ok (6.643s) 2023-01-11T21:44:28.5817441Z 2023-01-11T21:44:28.5817706Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5817820Z Ran 1 test in 6.643s 2023-01-11T21:44:28.5817839Z 2023-01-11T21:44:28.5817929Z OK 2023-01-11T21:44:28.5817948Z 2023-01-11T21:44:28.5818069Z Generating XML reports... 2023-01-11T21:44:28.5818512Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212407.xml 2023-01-11T21:44:28.5818875Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5819047Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5819421Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5819592Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5819612Z 2023-01-11T21:44:28.5819718Z Running tests... 2023-01-11T21:44:28.5819980Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5820284Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5820618Z test_all_reduce_min (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5820839Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15158 2023-01-11T21:44:28.5821052Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15159 2023-01-11T21:44:28.5821422Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5821579Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5821951Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5822134Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5822493Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5822667Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5823089Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5823285Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5823528Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5823769Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5824149Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5824541Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5824769Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5825105Z STAGE:2023-01-11 21:24:20 15159:15159 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5825330Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5825651Z STAGE:2023-01-11 21:24:20 15158:15158 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5825921Z [1673472260.779142] [7c5487d9c02b:15159:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5826149Z [1673472262.387640] [7c5487d9c02b:15159:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5826383Z [1673472262.387640] [7c5487d9c02b:15159:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5826701Z STAGE:2023-01-11 21:24:22 15159:15159 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5826968Z [1673472260.759309] [7c5487d9c02b:15158:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5827198Z [1673472262.391274] [7c5487d9c02b:15158:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5827431Z [1673472262.391274] [7c5487d9c02b:15158:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5827766Z STAGE:2023-01-11 21:24:22 15158:15158 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5828109Z STAGE:2023-01-11 21:24:22 15159:15159 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5828452Z STAGE:2023-01-11 21:24:22 15158:15158 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5828776Z STAGE:2023-01-11 21:24:22 15159:15159 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5829161Z STAGE:2023-01-11 21:24:22 15158:15158 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5829478Z STAGE:2023-01-11 21:24:22 15159:15159 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5829803Z STAGE:2023-01-11 21:24:22 15158:15158 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5830364Z STAGE:2023-01-11 21:24:22 15159:15159 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 21:24:22 15158:15158 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5830385Z 2023-01-11T21:44:28.5830484Z ok (6.566s) 2023-01-11T21:44:28.5830504Z 2023-01-11T21:44:28.5830765Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5830871Z Ran 1 test in 6.567s 2023-01-11T21:44:28.5830890Z 2023-01-11T21:44:28.5830980Z OK 2023-01-11T21:44:28.5831002Z 2023-01-11T21:44:28.5831122Z Generating XML reports... 2023-01-11T21:44:28.5831563Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212416.xml 2023-01-11T21:44:28.5831992Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5832159Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5832538Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5832728Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5832748Z 2023-01-11T21:44:28.5832856Z Running tests... 2023-01-11T21:44:28.5833115Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5833422Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5833685Z test_all_reduce_multigpu (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5833903Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15272 2023-01-11T21:44:28.5834102Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15273 2023-01-11T21:44:28.5834468Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5834639Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5835009Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5835198Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5835554Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5835723Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5836092Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5836281Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5836507Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5836747Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5837141Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5837530Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5837756Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5837982Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5838370Z STAGE:2023-01-11 21:24:31 15273:15273 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5839136Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1582: UserWarning: torch.distributed.all_reduce_multigpu will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#multi-gpu-collective-functions 2023-01-11T21:44:28.5839247Z warnings.warn( 2023-01-11T21:44:28.5839558Z STAGE:2023-01-11 21:24:31 15272:15272 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5840311Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1582: UserWarning: torch.distributed.all_reduce_multigpu will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#multi-gpu-collective-functions 2023-01-11T21:44:28.5840426Z warnings.warn( 2023-01-11T21:44:28.5840696Z [1673472271.486363] [7c5487d9c02b:15273:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5840969Z [1673472271.501217] [7c5487d9c02b:15273:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5841212Z [1673472271.501217] [7c5487d9c02b:15273:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5841478Z [1673472271.484632] [7c5487d9c02b:15272:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5841707Z [1673472271.499515] [7c5487d9c02b:15272:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5841941Z [1673472271.499515] [7c5487d9c02b:15272:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5842495Z STAGE:2023-01-11 21:24:31 15273:15273 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:24:31 15272:15272 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5842517Z 2023-01-11T21:44:28.5842858Z STAGE:2023-01-11 21:24:31 15272:15272 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5843182Z STAGE:2023-01-11 21:24:31 15273:15273 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5843507Z STAGE:2023-01-11 21:24:32 15272:15272 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5843821Z STAGE:2023-01-11 21:24:32 15273:15273 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5844146Z STAGE:2023-01-11 21:24:32 15272:15272 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5844484Z STAGE:2023-01-11 21:24:32 15272:15272 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5844814Z STAGE:2023-01-11 21:24:32 15273:15273 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5845152Z STAGE:2023-01-11 21:24:32 15273:15273 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5845250Z ok (6.546s) 2023-01-11T21:44:28.5845269Z 2023-01-11T21:44:28.5845529Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5845623Z Ran 1 test in 6.546s 2023-01-11T21:44:28.5845642Z 2023-01-11T21:44:28.5845732Z OK 2023-01-11T21:44:28.5845751Z 2023-01-11T21:44:28.5845870Z Generating XML reports... 2023-01-11T21:44:28.5846310Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212425.xml 2023-01-11T21:44:28.5846676Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5846918Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5847298Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5847491Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5847512Z 2023-01-11T21:44:28.5847621Z Running tests... 2023-01-11T21:44:28.5847863Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5848169Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5848441Z test_all_reduce_multigpu_complex (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5848659Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15390 2023-01-11T21:44:28.5848874Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15391 2023-01-11T21:44:28.5849241Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5849417Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5849835Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5850013Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5850377Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5850549Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5850923Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5851111Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5851357Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5851601Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5851996Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5852385Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5852596Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5852818Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5853147Z STAGE:2023-01-11 21:24:40 15391:15391 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5853951Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1582: UserWarning: torch.distributed.all_reduce_multigpu will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#multi-gpu-collective-functions 2023-01-11T21:44:28.5854073Z warnings.warn( 2023-01-11T21:44:28.5854404Z STAGE:2023-01-11 21:24:40 15390:15390 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5855158Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1582: UserWarning: torch.distributed.all_reduce_multigpu will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#multi-gpu-collective-functions 2023-01-11T21:44:28.5855268Z warnings.warn( 2023-01-11T21:44:28.5855541Z [1673472280.599107] [7c5487d9c02b:15390:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5855772Z [1673472280.614037] [7c5487d9c02b:15390:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5856052Z [1673472280.614037] [7c5487d9c02b:15390:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5856324Z [1673472280.604109] [7c5487d9c02b:15391:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5856764Z [1673472280.618827] [7c5487d9c02b:15391:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5857016Z [1673472280.618827] [7c5487d9c02b:15391:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5857576Z STAGE:2023-01-11 21:24:41 15390:15390 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:24:41 15391:15391 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5857597Z 2023-01-11T21:44:28.5857940Z STAGE:2023-01-11 21:24:41 15391:15391 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5858288Z STAGE:2023-01-11 21:24:41 15390:15390 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5858687Z STAGE:2023-01-11 21:24:41 15390:15390 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5859033Z STAGE:2023-01-11 21:24:41 15390:15390 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5859372Z STAGE:2023-01-11 21:24:41 15390:15390 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5859694Z STAGE:2023-01-11 21:24:41 15391:15391 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5860004Z STAGE:2023-01-11 21:24:41 15391:15391 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5860343Z STAGE:2023-01-11 21:24:41 15391:15391 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5860444Z ok (6.748s) 2023-01-11T21:44:28.5860467Z 2023-01-11T21:44:28.5860727Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5860837Z Ran 1 test in 6.749s 2023-01-11T21:44:28.5860857Z 2023-01-11T21:44:28.5860947Z OK 2023-01-11T21:44:28.5860969Z 2023-01-11T21:44:28.5861092Z Generating XML reports... 2023-01-11T21:44:28.5861534Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212435.xml 2023-01-11T21:44:28.5861884Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5862061Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5862435Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5862624Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5862644Z 2023-01-11T21:44:28.5862751Z Running tests... 2023-01-11T21:44:28.5863008Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5863315Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5863572Z test_all_reduce_product (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5863787Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15508 2023-01-11T21:44:28.5863984Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15509 2023-01-11T21:44:28.5864349Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5864519Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5864896Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5865081Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5865518Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5865694Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5866067Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5866236Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5866481Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5866722Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5867118Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5867511Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5867739Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5868115Z STAGE:2023-01-11 21:24:48 15509:15509 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5868351Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5868680Z STAGE:2023-01-11 21:24:48 15508:15508 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5868932Z [1673472288.120655] [7c5487d9c02b:15509:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5869167Z [1673472289.748202] [7c5487d9c02b:15509:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5869404Z [1673472289.748202] [7c5487d9c02b:15509:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5869679Z [1673472288.114113] [7c5487d9c02b:15508:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5869905Z [1673472289.756009] [7c5487d9c02b:15508:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5870135Z [1673472289.756009] [7c5487d9c02b:15508:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5870676Z STAGE:2023-01-11 21:24:50 15509:15509 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:24:50 15508:15508 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5870696Z 2023-01-11T21:44:28.5871042Z STAGE:2023-01-11 21:24:50 15509:15509 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5871382Z STAGE:2023-01-11 21:24:50 15508:15508 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5871710Z STAGE:2023-01-11 21:24:50 15509:15509 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5872031Z STAGE:2023-01-11 21:24:50 15508:15508 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5872346Z STAGE:2023-01-11 21:24:50 15509:15509 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5872670Z STAGE:2023-01-11 21:24:50 15508:15508 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5873012Z STAGE:2023-01-11 21:24:50 15509:15509 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5873349Z STAGE:2023-01-11 21:24:50 15508:15508 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5873449Z ok (6.594s) 2023-01-11T21:44:28.5873468Z 2023-01-11T21:44:28.5873730Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5873896Z Ran 1 test in 6.595s 2023-01-11T21:44:28.5873915Z 2023-01-11T21:44:28.5874008Z OK 2023-01-11T21:44:28.5874026Z 2023-01-11T21:44:28.5874132Z Generating XML reports... 2023-01-11T21:44:28.5874582Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212444.xml 2023-01-11T21:44:28.5874949Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5875123Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5875495Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5875683Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5875702Z 2023-01-11T21:44:28.5875809Z Running tests... 2023-01-11T21:44:28.5876065Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5876379Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5876673Z test_all_reduce_result_cuda (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5876899Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15622 2023-01-11T21:44:28.5877115Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15623 2023-01-11T21:44:28.5877482Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5877653Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5878025Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5878213Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5878569Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5878726Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5879098Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5879282Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5879524Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5879763Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5880157Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5880548Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5880780Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5881005Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5881262Z [1673472298.706495] [7c5487d9c02b:15623:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5881489Z [1673472298.719648] [7c5487d9c02b:15623:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5881723Z [1673472298.719648] [7c5487d9c02b:15623:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5881991Z [1673472298.701123] [7c5487d9c02b:15622:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5882215Z [1673472298.714735] [7c5487d9c02b:15622:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5882505Z [1673472298.714735] [7c5487d9c02b:15622:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5882611Z ok (6.144s) 2023-01-11T21:44:28.5882631Z 2023-01-11T21:44:28.5882903Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5883020Z Ran 1 test in 6.144s 2023-01-11T21:44:28.5883040Z 2023-01-11T21:44:28.5883133Z OK 2023-01-11T21:44:28.5883152Z 2023-01-11T21:44:28.5883256Z Generating XML reports... 2023-01-11T21:44:28.5883695Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212453.xml 2023-01-11T21:44:28.5884063Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5884235Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5884605Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5884797Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5884816Z 2023-01-11T21:44:28.5884970Z Running tests... 2023-01-11T21:44:28.5885240Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5885531Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5885780Z test_all_reduce_sum (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5885997Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15736 2023-01-11T21:44:28.5886214Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15737 2023-01-11T21:44:28.5886581Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5886755Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5887133Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5887322Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5887681Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5887834Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5888199Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5888384Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5888624Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5888863Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5889260Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5889655Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5889884Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5890213Z STAGE:2023-01-11 21:25:05 15737:15737 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5890421Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5890750Z STAGE:2023-01-11 21:25:05 15736:15736 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5891019Z [1673472305.972457] [7c5487d9c02b:15737:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5891301Z [1673472307.588967] [7c5487d9c02b:15737:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5891541Z [1673472307.588967] [7c5487d9c02b:15737:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5891810Z [1673472305.951640] [7c5487d9c02b:15736:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5892035Z [1673472307.609638] [7c5487d9c02b:15736:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5892268Z [1673472307.609638] [7c5487d9c02b:15736:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5892816Z STAGE:2023-01-11 21:25:07 15737:15737 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:25:07 15736:15736 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5892839Z 2023-01-11T21:44:28.5893183Z STAGE:2023-01-11 21:25:07 15737:15737 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5893553Z STAGE:2023-01-11 21:25:07 15736:15736 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5893889Z STAGE:2023-01-11 21:25:08 15737:15737 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5894208Z STAGE:2023-01-11 21:25:08 15736:15736 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5894539Z STAGE:2023-01-11 21:25:08 15737:15737 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5894864Z STAGE:2023-01-11 21:25:08 15736:15736 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5895205Z STAGE:2023-01-11 21:25:08 15737:15737 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5895540Z STAGE:2023-01-11 21:25:08 15736:15736 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5895644Z ok (6.712s) 2023-01-11T21:44:28.5895664Z 2023-01-11T21:44:28.5895924Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5896019Z Ran 1 test in 6.712s 2023-01-11T21:44:28.5896038Z 2023-01-11T21:44:28.5896130Z OK 2023-01-11T21:44:28.5896149Z 2023-01-11T21:44:28.5896270Z Generating XML reports... 2023-01-11T21:44:28.5896939Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212502.xml 2023-01-11T21:44:28.5897324Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5897501Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5897876Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5898069Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5898089Z 2023-01-11T21:44:28.5898196Z Running tests... 2023-01-11T21:44:28.5898441Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5898747Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5899002Z test_all_reduce_sum_async (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5899219Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15850 2023-01-11T21:44:28.5899429Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15851 2023-01-11T21:44:28.5899798Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5899971Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5900345Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5900601Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5900969Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5901141Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5901513Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5901699Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5901944Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5902186Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5902579Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5902973Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5903244Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5903589Z STAGE:2023-01-11 21:25:15 15851:15851 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5903815Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5904140Z STAGE:2023-01-11 21:25:15 15850:15850 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5904410Z [1673472315.226286] [7c5487d9c02b:15850:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5904637Z [1673472316.869553] [7c5487d9c02b:15850:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5904877Z [1673472316.869553] [7c5487d9c02b:15850:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5905147Z [1673472315.227015] [7c5487d9c02b:15851:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5905372Z [1673472316.869266] [7c5487d9c02b:15851:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5905602Z [1673472316.869266] [7c5487d9c02b:15851:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5906129Z STAGE:2023-01-11 21:25:17 15850:15850 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:25:17 15851:15851 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5906171Z 2023-01-11T21:44:28.5906497Z STAGE:2023-01-11 21:25:17 15851:15851 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5906843Z STAGE:2023-01-11 21:25:17 15850:15850 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5907171Z STAGE:2023-01-11 21:25:17 15851:15851 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5907487Z STAGE:2023-01-11 21:25:17 15850:15850 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5907813Z STAGE:2023-01-11 21:25:17 15851:15851 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5908154Z STAGE:2023-01-11 21:25:17 15851:15851 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5908482Z STAGE:2023-01-11 21:25:17 15850:15850 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5908821Z STAGE:2023-01-11 21:25:17 15850:15850 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5908975Z ok (6.637s) 2023-01-11T21:44:28.5909014Z 2023-01-11T21:44:28.5909263Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5909373Z Ran 1 test in 6.637s 2023-01-11T21:44:28.5909396Z 2023-01-11T21:44:28.5909486Z OK 2023-01-11T21:44:28.5909505Z 2023-01-11T21:44:28.5909627Z Generating XML reports... 2023-01-11T21:44:28.5910069Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212511.xml 2023-01-11T21:44:28.5910434Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5910606Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5910979Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5911150Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5911173Z 2023-01-11T21:44:28.5911280Z Running tests... 2023-01-11T21:44:28.5911540Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5911908Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5912178Z test_all_reduce_sum_complex (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.5912398Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15964 2023-01-11T21:44:28.5912614Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15965 2023-01-11T21:44:28.5912984Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5913140Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5913513Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5913704Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5914065Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5914234Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5914606Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5914791Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5915033Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.5915271Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.5915652Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5916045Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.5916273Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.5916603Z STAGE:2023-01-11 21:25:24 15964:15964 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5916827Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.5917152Z STAGE:2023-01-11 21:25:24 15965:15965 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5917422Z [1673472324.434476] [7c5487d9c02b:15965:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5917649Z [1673472326.067397] [7c5487d9c02b:15965:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5917884Z [1673472326.067397] [7c5487d9c02b:15965:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5918261Z STAGE:2023-01-11 21:25:26 15965:15965 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5918609Z STAGE:2023-01-11 21:25:26 15965:15965 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5918879Z [1673472324.431927] [7c5487d9c02b:15964:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.5919105Z [1673472326.052418] [7c5487d9c02b:15964:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.5919333Z [1673472326.052418] [7c5487d9c02b:15964:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.5919662Z STAGE:2023-01-11 21:25:26 15964:15964 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5920010Z STAGE:2023-01-11 21:25:26 15964:15964 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5920379Z STAGE:2023-01-11 21:25:26 15965:15965 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5920708Z STAGE:2023-01-11 21:25:26 15964:15964 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.5921034Z STAGE:2023-01-11 21:25:26 15965:15965 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5921342Z STAGE:2023-01-11 21:25:26 15964:15964 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.5921685Z STAGE:2023-01-11 21:25:26 15965:15965 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5922024Z STAGE:2023-01-11 21:25:26 15964:15964 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.5922125Z ok (6.559s) 2023-01-11T21:44:28.5922144Z 2023-01-11T21:44:28.5922405Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5922520Z Ran 1 test in 6.559s 2023-01-11T21:44:28.5922539Z 2023-01-11T21:44:28.5922630Z OK 2023-01-11T21:44:28.5922649Z 2023-01-11T21:44:28.5922772Z Generating XML reports... 2023-01-11T21:44:28.5923198Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212520.xml 2023-01-11T21:44:28.5923561Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5923734Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5924110Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5924297Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5924318Z 2023-01-11T21:44:28.5924423Z Running tests... 2023-01-11T21:44:28.5924680Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5924988Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5925285Z test_all_reduce_sum_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Gloo and NCCL backends will have CUDA allReduce tested (0.002s) 2023-01-11T21:44:28.5925305Z 2023-01-11T21:44:28.5925547Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5925660Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5925679Z 2023-01-11T21:44:28.5925783Z OK (skipped=1) 2023-01-11T21:44:28.5925802Z 2023-01-11T21:44:28.5925921Z Generating XML reports... 2023-01-11T21:44:28.5926361Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212529.xml 2023-01-11T21:44:28.5926724Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5926955Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5927333Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5927527Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5927547Z 2023-01-11T21:44:28.5927638Z Running tests... 2023-01-11T21:44:28.5927897Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5928201Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5928501Z test_all_reduce_sum_cuda_async (__main__.TestDistBackendWithSpawn) ... skip: Only Gloo and NCCL backends will have CUDA allReduce tested (0.002s) 2023-01-11T21:44:28.5928520Z 2023-01-11T21:44:28.5928776Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5928885Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5928904Z 2023-01-11T21:44:28.5929009Z OK (skipped=1) 2023-01-11T21:44:28.5929031Z 2023-01-11T21:44:28.5929151Z Generating XML reports... 2023-01-11T21:44:28.5929632Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212532.xml 2023-01-11T21:44:28.5929987Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5930164Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5930537Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5930728Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5930747Z 2023-01-11T21:44:28.5930857Z Running tests... 2023-01-11T21:44:28.5931115Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5931418Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5931726Z test_all_reduce_sum_cuda_complex (__main__.TestDistBackendWithSpawn) ... skip: Only Gloo and NCCL backends will have CUDA allReduce tested (0.002s) 2023-01-11T21:44:28.5931746Z 2023-01-11T21:44:28.5932005Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5932098Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5932118Z 2023-01-11T21:44:28.5932223Z OK (skipped=1) 2023-01-11T21:44:28.5932242Z 2023-01-11T21:44:28.5932361Z Generating XML reports... 2023-01-11T21:44:28.5932793Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212534.xml 2023-01-11T21:44:28.5933152Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5933326Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5933697Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5933885Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5933904Z 2023-01-11T21:44:28.5934010Z Running tests... 2023-01-11T21:44:28.5934254Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5934554Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5934795Z test_all_to_all (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports all_to_all (0.002s) 2023-01-11T21:44:28.5934815Z 2023-01-11T21:44:28.5935071Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5935180Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5935199Z 2023-01-11T21:44:28.5935307Z OK (skipped=1) 2023-01-11T21:44:28.5935326Z 2023-01-11T21:44:28.5935446Z Generating XML reports... 2023-01-11T21:44:28.5935877Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212536.xml 2023-01-11T21:44:28.5936283Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5936465Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5937088Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5937277Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5937295Z 2023-01-11T21:44:28.5937403Z Running tests... 2023-01-11T21:44:28.5937667Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5937970Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5938223Z test_all_to_all_complex (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports all_to_all (0.002s) 2023-01-11T21:44:28.5938242Z 2023-01-11T21:44:28.5938506Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5938599Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5938619Z 2023-01-11T21:44:28.5938727Z OK (skipped=1) 2023-01-11T21:44:28.5938745Z 2023-01-11T21:44:28.5938942Z Generating XML reports... 2023-01-11T21:44:28.5939395Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212539.xml 2023-01-11T21:44:28.5939759Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5939932Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5940306Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5940494Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5940513Z 2023-01-11T21:44:28.5940620Z Running tests... 2023-01-11T21:44:28.5940865Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5941167Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5941425Z test_all_to_all_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only NCCL supports CUDA all_to_all (0.002s) 2023-01-11T21:44:28.5941444Z 2023-01-11T21:44:28.5941699Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5941807Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5941826Z 2023-01-11T21:44:28.5941932Z OK (skipped=1) 2023-01-11T21:44:28.5941951Z 2023-01-11T21:44:28.5942072Z Generating XML reports... 2023-01-11T21:44:28.5942508Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212541.xml 2023-01-11T21:44:28.5942855Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5943027Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5943402Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5943591Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5943610Z 2023-01-11T21:44:28.5943718Z Running tests... 2023-01-11T21:44:28.5943975Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5944279Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5944543Z test_all_to_all_cuda_complex (__main__.TestDistBackendWithSpawn) ... skip: Only NCCL supports CUDA all_to_all (0.002s) 2023-01-11T21:44:28.5944563Z 2023-01-11T21:44:28.5944820Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5944914Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5944934Z 2023-01-11T21:44:28.5945039Z OK (skipped=1) 2023-01-11T21:44:28.5945129Z 2023-01-11T21:44:28.5945261Z Generating XML reports... 2023-01-11T21:44:28.5945702Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212543.xml 2023-01-11T21:44:28.5946072Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5946246Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5946619Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5946806Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5946825Z 2023-01-11T21:44:28.5946932Z Running tests... 2023-01-11T21:44:28.5947172Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5947476Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5947732Z test_all_to_all_full_group (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports all_to_all (0.002s) 2023-01-11T21:44:28.5947752Z 2023-01-11T21:44:28.5948056Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5948174Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5948193Z 2023-01-11T21:44:28.5948300Z OK (skipped=1) 2023-01-11T21:44:28.5948319Z 2023-01-11T21:44:28.5948443Z Generating XML reports... 2023-01-11T21:44:28.5948883Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212546.xml 2023-01-11T21:44:28.5949233Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5949405Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5949776Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5949966Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5949986Z 2023-01-11T21:44:28.5950090Z Running tests... 2023-01-11T21:44:28.5950348Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5950655Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5950923Z test_all_to_all_full_group_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only NCCL supports CUDA all_to_all (0.002s) 2023-01-11T21:44:28.5950942Z 2023-01-11T21:44:28.5951197Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5951291Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5951310Z 2023-01-11T21:44:28.5951416Z OK (skipped=1) 2023-01-11T21:44:28.5951435Z 2023-01-11T21:44:28.5951555Z Generating XML reports... 2023-01-11T21:44:28.5951990Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212548.xml 2023-01-11T21:44:28.5952357Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5952529Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5952903Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5953090Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5953109Z 2023-01-11T21:44:28.5953216Z Running tests... 2023-01-11T21:44:28.5953456Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5953819Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5954069Z test_all_to_all_group (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports all_to_all (0.002s) 2023-01-11T21:44:28.5954088Z 2023-01-11T21:44:28.5954347Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5954519Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5954538Z 2023-01-11T21:44:28.5954646Z OK (skipped=1) 2023-01-11T21:44:28.5954665Z 2023-01-11T21:44:28.5954791Z Generating XML reports... 2023-01-11T21:44:28.5955233Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212551.xml 2023-01-11T21:44:28.5955597Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5955752Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5956123Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5956310Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5956329Z 2023-01-11T21:44:28.5956435Z Running tests... 2023-01-11T21:44:28.5956690Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5956998Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5957378Z test_all_to_all_group_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T21:44:28.5957401Z 2023-01-11T21:44:28.5957671Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5957764Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5957802Z 2023-01-11T21:44:28.5957891Z OK (skipped=1) 2023-01-11T21:44:28.5957910Z 2023-01-11T21:44:28.5958032Z Generating XML reports... 2023-01-11T21:44:28.5958467Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212553.xml 2023-01-11T21:44:28.5958828Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5959000Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5959377Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5959568Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5959588Z 2023-01-11T21:44:28.5959695Z Running tests... 2023-01-11T21:44:28.5959934Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5960242Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5960520Z test_all_to_all_single_equal_split (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T21:44:28.5960541Z 2023-01-11T21:44:28.5960797Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5960909Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5960928Z 2023-01-11T21:44:28.5961033Z OK (skipped=1) 2023-01-11T21:44:28.5961053Z 2023-01-11T21:44:28.5961178Z Generating XML reports... 2023-01-11T21:44:28.5961610Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212555.xml 2023-01-11T21:44:28.5961974Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5962129Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5962499Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5962684Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5962703Z 2023-01-11T21:44:28.5962808Z Running tests... 2023-01-11T21:44:28.5963062Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5963365Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5963653Z test_all_to_all_single_equal_split_complex (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T21:44:28.5963724Z 2023-01-11T21:44:28.5963989Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5964107Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5964127Z 2023-01-11T21:44:28.5964215Z OK (skipped=1) 2023-01-11T21:44:28.5964234Z 2023-01-11T21:44:28.5964356Z Generating XML reports... 2023-01-11T21:44:28.5964792Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212558.xml 2023-01-11T21:44:28.5965154Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5965325Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5965698Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5965889Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5965908Z 2023-01-11T21:44:28.5966014Z Running tests... 2023-01-11T21:44:28.5966298Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5966614Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5966902Z test_all_to_all_single_equal_split_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T21:44:28.5966922Z 2023-01-11T21:44:28.5967179Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5967289Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5967308Z 2023-01-11T21:44:28.5967412Z OK (skipped=1) 2023-01-11T21:44:28.5967431Z 2023-01-11T21:44:28.5967552Z Generating XML reports... 2023-01-11T21:44:28.5967986Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212600.xml 2023-01-11T21:44:28.5968354Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5968511Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5968880Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5969066Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5969085Z 2023-01-11T21:44:28.5969190Z Running tests... 2023-01-11T21:44:28.5969448Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5969754Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5970048Z test_all_to_all_single_equal_split_cuda_complex (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T21:44:28.5970069Z 2023-01-11T21:44:28.5970325Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5970435Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5970454Z 2023-01-11T21:44:28.5970543Z OK (skipped=1) 2023-01-11T21:44:28.5970561Z 2023-01-11T21:44:28.5970686Z Generating XML reports... 2023-01-11T21:44:28.5971117Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212602.xml 2023-01-11T21:44:28.5971482Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5971650Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5972018Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5972204Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5972223Z 2023-01-11T21:44:28.5972332Z Running tests... 2023-01-11T21:44:28.5972666Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5972952Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5973248Z test_all_to_all_single_equal_split_full_group (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T21:44:28.5973268Z 2023-01-11T21:44:28.5973522Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5973635Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5973655Z 2023-01-11T21:44:28.5973759Z OK (skipped=1) 2023-01-11T21:44:28.5973778Z 2023-01-11T21:44:28.5973899Z Generating XML reports... 2023-01-11T21:44:28.5974335Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212605.xml 2023-01-11T21:44:28.5974697Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5974875Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5975234Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5975470Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5975491Z 2023-01-11T21:44:28.5975604Z Running tests... 2023-01-11T21:44:28.5975866Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5976173Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5976475Z test_all_to_all_single_equal_split_full_group_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T21:44:28.5976495Z 2023-01-11T21:44:28.5976995Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5977111Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5977132Z 2023-01-11T21:44:28.5977231Z OK (skipped=1) 2023-01-11T21:44:28.5977275Z 2023-01-11T21:44:28.5977376Z Generating XML reports... 2023-01-11T21:44:28.5977820Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212607.xml 2023-01-11T21:44:28.5978182Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5978354Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5978729Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5978920Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5978941Z 2023-01-11T21:44:28.5979048Z Running tests... 2023-01-11T21:44:28.5979305Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5979592Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5979884Z test_all_to_all_single_equal_split_group (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T21:44:28.5979903Z 2023-01-11T21:44:28.5980163Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5980274Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5980293Z 2023-01-11T21:44:28.5980397Z OK (skipped=1) 2023-01-11T21:44:28.5980415Z 2023-01-11T21:44:28.5980535Z Generating XML reports... 2023-01-11T21:44:28.5980967Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212610.xml 2023-01-11T21:44:28.5981328Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5981500Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5981855Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5982129Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5982149Z 2023-01-11T21:44:28.5982257Z Running tests... 2023-01-11T21:44:28.5982523Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5982829Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5983125Z test_all_to_all_single_equal_split_group_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T21:44:28.5983145Z 2023-01-11T21:44:28.5983399Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5983509Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5983528Z 2023-01-11T21:44:28.5983634Z OK (skipped=1) 2023-01-11T21:44:28.5983653Z 2023-01-11T21:44:28.5983756Z Generating XML reports... 2023-01-11T21:44:28.5984192Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212612.xml 2023-01-11T21:44:28.5984618Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5984801Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5985180Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5985369Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5985388Z 2023-01-11T21:44:28.5985496Z Running tests... 2023-01-11T21:44:28.5985755Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5986044Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5986326Z test_all_to_all_single_unequal_split (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T21:44:28.5986350Z 2023-01-11T21:44:28.5986608Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5986718Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5986737Z 2023-01-11T21:44:28.5986845Z OK (skipped=1) 2023-01-11T21:44:28.5986864Z 2023-01-11T21:44:28.5986984Z Generating XML reports... 2023-01-11T21:44:28.5987417Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212614.xml 2023-01-11T21:44:28.5987777Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5987947Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5988298Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5988484Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5988507Z 2023-01-11T21:44:28.5988614Z Running tests... 2023-01-11T21:44:28.5988870Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5989174Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5989466Z test_all_to_all_single_unequal_split_complex (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T21:44:28.5989485Z 2023-01-11T21:44:28.5989742Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5989851Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5989870Z 2023-01-11T21:44:28.5989975Z OK (skipped=1) 2023-01-11T21:44:28.5989994Z 2023-01-11T21:44:28.5990099Z Generating XML reports... 2023-01-11T21:44:28.5990534Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212617.xml 2023-01-11T21:44:28.5990896Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5991128Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5991511Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5991699Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5991719Z 2023-01-11T21:44:28.5991828Z Running tests... 2023-01-11T21:44:28.5992090Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5992396Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5992671Z test_all_to_all_single_unequal_split_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T21:44:28.5992691Z 2023-01-11T21:44:28.5992950Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5993064Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5993084Z 2023-01-11T21:44:28.5993188Z OK (skipped=1) 2023-01-11T21:44:28.5993207Z 2023-01-11T21:44:28.5993328Z Generating XML reports... 2023-01-11T21:44:28.5993807Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212619.xml 2023-01-11T21:44:28.5994180Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5994355Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5994724Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5994893Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5994912Z 2023-01-11T21:44:28.5995017Z Running tests... 2023-01-11T21:44:28.5995276Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5995586Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5995892Z test_all_to_all_single_unequal_split_cuda_complex (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T21:44:28.5995912Z 2023-01-11T21:44:28.5996169Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5996278Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5996297Z 2023-01-11T21:44:28.5996402Z OK (skipped=1) 2023-01-11T21:44:28.5996422Z 2023-01-11T21:44:28.5996541Z Generating XML reports... 2023-01-11T21:44:28.5996956Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212622.xml 2023-01-11T21:44:28.5997318Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.5997488Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.5997858Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.5998047Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.5998069Z 2023-01-11T21:44:28.5998178Z Running tests... 2023-01-11T21:44:28.5998435Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5998737Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.5999017Z test_all_to_all_single_unequal_split_full_group (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T21:44:28.5999056Z 2023-01-11T21:44:28.5999292Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.5999403Z Ran 1 test in 0.002s 2023-01-11T21:44:28.5999423Z 2023-01-11T21:44:28.5999529Z OK (skipped=1) 2023-01-11T21:44:28.5999548Z 2023-01-11T21:44:28.5999668Z Generating XML reports... 2023-01-11T21:44:28.6000170Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212624.xml 2023-01-11T21:44:28.6000539Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6000715Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6001090Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6001258Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6001294Z 2023-01-11T21:44:28.6001384Z Running tests... 2023-01-11T21:44:28.6001641Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6001945Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6002245Z test_all_to_all_single_unequal_split_full_group_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T21:44:28.6002268Z 2023-01-11T21:44:28.6002521Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6002678Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6002699Z 2023-01-11T21:44:28.6002811Z OK (skipped=1) 2023-01-11T21:44:28.6002830Z 2023-01-11T21:44:28.6002953Z Generating XML reports... 2023-01-11T21:44:28.6003370Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212626.xml 2023-01-11T21:44:28.6003736Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6003910Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6004281Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6004472Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6004492Z 2023-01-11T21:44:28.6004598Z Running tests... 2023-01-11T21:44:28.6004858Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6005162Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6005450Z test_all_to_all_single_unequal_split_group (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T21:44:28.6005469Z 2023-01-11T21:44:28.6005708Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6005817Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6005837Z 2023-01-11T21:44:28.6005942Z OK (skipped=1) 2023-01-11T21:44:28.6005961Z 2023-01-11T21:44:28.6006081Z Generating XML reports... 2023-01-11T21:44:28.6006514Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212629.xml 2023-01-11T21:44:28.6006879Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6007056Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6007434Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6007622Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6007641Z 2023-01-11T21:44:28.6007730Z Running tests... 2023-01-11T21:44:28.6007988Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6008294Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6008592Z test_all_to_all_single_unequal_split_group_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T21:44:28.6008612Z 2023-01-11T21:44:28.6008928Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6009041Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6009060Z 2023-01-11T21:44:28.6009170Z OK (skipped=1) 2023-01-11T21:44:28.6009189Z 2023-01-11T21:44:28.6009315Z Generating XML reports... 2023-01-11T21:44:28.6009733Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212631.xml 2023-01-11T21:44:28.6010094Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6010266Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6010638Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6010824Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6010843Z 2023-01-11T21:44:28.6010949Z Running tests... 2023-01-11T21:44:28.6011208Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6011512Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6011821Z test_average_parameters (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6012027Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 16969 2023-01-11T21:44:28.6012244Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 16970 2023-01-11T21:44:28.6012614Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6012786Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6013159Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6013349Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6013712Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6013888Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6014242Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6014428Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6014669Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6014909Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6015305Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6015694Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6015923Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6016152Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6016424Z [1673472400.235784] [7c5487d9c02b:16969:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6016874Z [1673472400.249828] [7c5487d9c02b:16969:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6017100Z [1673472400.249828] [7c5487d9c02b:16969:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6017374Z [1673472400.242492] [7c5487d9c02b:16970:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6017598Z [1673472400.255724] [7c5487d9c02b:16970:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6017919Z [1673472400.255724] [7c5487d9c02b:16970:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6018163Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:44:28.6018403Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:44:28.6018812Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.6019205Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.6019308Z ok (7.323s) 2023-01-11T21:44:28.6019329Z 2023-01-11T21:44:28.6019572Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6019687Z Ran 1 test in 7.323s 2023-01-11T21:44:28.6019706Z 2023-01-11T21:44:28.6019796Z OK 2023-01-11T21:44:28.6019817Z 2023-01-11T21:44:28.6019943Z Generating XML reports... 2023-01-11T21:44:28.6020443Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212634.xml 2023-01-11T21:44:28.6020822Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6020997Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6021369Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6021560Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6021579Z 2023-01-11T21:44:28.6021669Z Running tests... 2023-01-11T21:44:28.6021932Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6022242Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6022498Z test_backend_full_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6022718Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 17093 2023-01-11T21:44:28.6022932Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 17094 2023-01-11T21:44:28.6023297Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6023468Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6023829Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6024018Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6024376Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6024548Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6024928Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6025113Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6025356Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6025596Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6025990Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6026365Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6026589Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6026874Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6027029Z skip: Need at least 3 CUDA devices (4.239s) 2023-01-11T21:44:28.6027049Z 2023-01-11T21:44:28.6027311Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6027423Z Ran 1 test in 4.239s 2023-01-11T21:44:28.6027443Z 2023-01-11T21:44:28.6027549Z OK (skipped=1) 2023-01-11T21:44:28.6027568Z 2023-01-11T21:44:28.6027691Z Generating XML reports... 2023-01-11T21:44:28.6028132Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212643.xml 2023-01-11T21:44:28.6028478Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6028656Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6029028Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6029218Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6029284Z 2023-01-11T21:44:28.6029400Z Running tests... 2023-01-11T21:44:28.6029668Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6029976Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6030223Z test_backend_group (__main__.TestDistBackendWithSpawn) ... skip: Test requires world size of 3 (0.002s) 2023-01-11T21:44:28.6030243Z 2023-01-11T21:44:28.6030503Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6030597Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6030616Z 2023-01-11T21:44:28.6030723Z OK (skipped=1) 2023-01-11T21:44:28.6030742Z 2023-01-11T21:44:28.6036046Z Generating XML reports... 2023-01-11T21:44:28.6036533Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212650.xml 2023-01-11T21:44:28.6036915Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6037091Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6037466Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6037656Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6037677Z 2023-01-11T21:44:28.6037785Z Running tests... 2023-01-11T21:44:28.6038030Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6038338Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6038585Z test_barrier (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support CPU barrier (0.002s) 2023-01-11T21:44:28.6038608Z 2023-01-11T21:44:28.6038872Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6038983Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6039003Z 2023-01-11T21:44:28.6039112Z OK (skipped=1) 2023-01-11T21:44:28.6039132Z 2023-01-11T21:44:28.6039253Z Generating XML reports... 2023-01-11T21:44:28.6039694Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212653.xml 2023-01-11T21:44:28.6040043Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6040215Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6040589Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6040779Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6040798Z 2023-01-11T21:44:28.6041006Z Running tests... 2023-01-11T21:44:28.6041270Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6041582Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6041833Z test_barrier_cuda (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6042051Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 17262 2023-01-11T21:44:28.6042248Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 17263 2023-01-11T21:44:28.6042613Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6042787Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6043160Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6043352Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6043709Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6043928Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6044315Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6044482Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6044722Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6044964Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6045358Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6045750Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6045984Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6046213Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6046485Z [1673472420.665272] [7c5487d9c02b:17263:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6046715Z [1673472420.678503] [7c5487d9c02b:17263:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6046947Z [1673472420.678503] [7c5487d9c02b:17263:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6047200Z [1673472420.659169] [7c5487d9c02b:17262:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6047429Z [1673472420.672997] [7c5487d9c02b:17262:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6047667Z [1673472420.672997] [7c5487d9c02b:17262:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6047769Z ok (6.963s) 2023-01-11T21:44:28.6047789Z 2023-01-11T21:44:28.6048054Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6048166Z Ran 1 test in 6.963s 2023-01-11T21:44:28.6048185Z 2023-01-11T21:44:28.6048276Z OK 2023-01-11T21:44:28.6048295Z 2023-01-11T21:44:28.6048417Z Generating XML reports... 2023-01-11T21:44:28.6048860Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212655.xml 2023-01-11T21:44:28.6049209Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6049443Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6049822Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6050017Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6050036Z 2023-01-11T21:44:28.6050145Z Running tests... 2023-01-11T21:44:28.6050405Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6050714Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6050974Z test_barrier_full_group (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support CPU barrier (0.002s) 2023-01-11T21:44:28.6050993Z 2023-01-11T21:44:28.6051250Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6051344Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6051363Z 2023-01-11T21:44:28.6051470Z OK (skipped=1) 2023-01-11T21:44:28.6051492Z 2023-01-11T21:44:28.6051613Z Generating XML reports... 2023-01-11T21:44:28.6052052Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212704.xml 2023-01-11T21:44:28.6052466Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6052647Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6053025Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6053214Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6053234Z 2023-01-11T21:44:28.6053322Z Running tests... 2023-01-11T21:44:28.6053636Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6053950Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6054217Z test_barrier_full_group_cuda (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6054438Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 17409 2023-01-11T21:44:28.6054660Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 17410 2023-01-11T21:44:28.6055026Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6055200Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6055557Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6055745Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6056109Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6056281Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6057037Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6057241Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6057489Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6057733Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6058142Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6058517Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6058746Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6058974Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6059240Z skip: Skipped due to small world size. (4.185s) 2023-01-11T21:44:28.6059260Z 2023-01-11T21:44:28.6059532Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6059645Z Ran 1 test in 4.185s 2023-01-11T21:44:28.6059664Z 2023-01-11T21:44:28.6059771Z OK (skipped=1) 2023-01-11T21:44:28.6059790Z 2023-01-11T21:44:28.6059913Z Generating XML reports... 2023-01-11T21:44:28.6060355Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212707.xml 2023-01-11T21:44:28.6060702Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6060875Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6061249Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6061440Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6061459Z 2023-01-11T21:44:28.6061566Z Running tests... 2023-01-11T21:44:28.6061897Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6062218Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6062476Z test_barrier_group (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support CPU barrier (0.002s) 2023-01-11T21:44:28.6062497Z 2023-01-11T21:44:28.6062754Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6062847Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6062867Z 2023-01-11T21:44:28.6062972Z OK (skipped=1) 2023-01-11T21:44:28.6062991Z 2023-01-11T21:44:28.6063114Z Generating XML reports... 2023-01-11T21:44:28.6063551Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212714.xml 2023-01-11T21:44:28.6063921Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6064097Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6064469Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6064656Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6064675Z 2023-01-11T21:44:28.6064765Z Running tests... 2023-01-11T21:44:28.6065025Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6065328Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6065583Z test_barrier_group_cuda (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6065799Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 17545 2023-01-11T21:44:28.6066019Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 17546 2023-01-11T21:44:28.6066384Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6066557Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6066932Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6067103Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6067460Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6067634Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6068002Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6068244Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6068490Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6068737Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6069136Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6069511Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6069738Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6069961Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6070119Z skip: Skipped due to small world size. (4.207s) 2023-01-11T21:44:28.6070139Z 2023-01-11T21:44:28.6070402Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6070517Z Ran 1 test in 4.207s 2023-01-11T21:44:28.6070537Z 2023-01-11T21:44:28.6070644Z OK (skipped=1) 2023-01-11T21:44:28.6070663Z 2023-01-11T21:44:28.6070832Z Generating XML reports... 2023-01-11T21:44:28.6071284Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212716.xml 2023-01-11T21:44:28.6071629Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6071806Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6072182Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6072370Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6072390Z 2023-01-11T21:44:28.6072499Z Running tests... 2023-01-11T21:44:28.6072763Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6073073Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6073353Z test_barrier_timeout_full_group (__main__.TestDistBackendWithSpawn) ... skip: Only gloo backend supports timeouts (0.002s) 2023-01-11T21:44:28.6073373Z 2023-01-11T21:44:28.6073629Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6073723Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6073742Z 2023-01-11T21:44:28.6073848Z OK (skipped=1) 2023-01-11T21:44:28.6073867Z 2023-01-11T21:44:28.6073989Z Generating XML reports... 2023-01-11T21:44:28.6074427Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212723.xml 2023-01-11T21:44:28.6074790Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6074962Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6075338Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6075529Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6075549Z 2023-01-11T21:44:28.6075656Z Running tests... 2023-01-11T21:44:28.6075897Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6076202Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6076472Z test_barrier_timeout_global (__main__.TestDistBackendWithSpawn) ... skip: Only gloo backend supports timeouts (0.002s) 2023-01-11T21:44:28.6076491Z 2023-01-11T21:44:28.6076746Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6076855Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6076874Z 2023-01-11T21:44:28.6076979Z OK (skipped=1) 2023-01-11T21:44:28.6077050Z 2023-01-11T21:44:28.6077178Z Generating XML reports... 2023-01-11T21:44:28.6077620Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212725.xml 2023-01-11T21:44:28.6077973Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6078147Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6078520Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6078712Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6078731Z 2023-01-11T21:44:28.6078838Z Running tests... 2023-01-11T21:44:28.6079095Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6079401Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6079673Z test_barrier_timeout_group (__main__.TestDistBackendWithSpawn) ... skip: Only gloo backend supports timeouts (0.002s) 2023-01-11T21:44:28.6079693Z 2023-01-11T21:44:28.6080003Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6080101Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6080121Z 2023-01-11T21:44:28.6080229Z OK (skipped=1) 2023-01-11T21:44:28.6080248Z 2023-01-11T21:44:28.6080372Z Generating XML reports... 2023-01-11T21:44:28.6080813Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212727.xml 2023-01-11T21:44:28.6081177Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6081349Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6081722Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6081915Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6081935Z 2023-01-11T21:44:28.6082042Z Running tests... 2023-01-11T21:44:28.6082286Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6082593Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6082846Z test_batch_isend_irecv_gloo (__main__.TestDistBackendWithSpawn) ... skip: GLOO Batch Send Recv CPU (0.002s) 2023-01-11T21:44:28.6082864Z 2023-01-11T21:44:28.6083125Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6083235Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6083254Z 2023-01-11T21:44:28.6083359Z OK (skipped=1) 2023-01-11T21:44:28.6083378Z 2023-01-11T21:44:28.6083500Z Generating XML reports... 2023-01-11T21:44:28.6083935Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212730.xml 2023-01-11T21:44:28.6084299Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6084455Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6084831Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6085020Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6085039Z 2023-01-11T21:44:28.6085146Z Running tests... 2023-01-11T21:44:28.6085404Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6085707Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6085965Z test_batch_isend_irecv_gloo_tags (__main__.TestDistBackendWithSpawn) ... skip: GLOO Batch Send Recv CPU (0.002s) 2023-01-11T21:44:28.6085984Z 2023-01-11T21:44:28.6086240Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6086390Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6086428Z 2023-01-11T21:44:28.6086517Z OK (skipped=1) 2023-01-11T21:44:28.6086535Z 2023-01-11T21:44:28.6086656Z Generating XML reports... 2023-01-11T21:44:28.6087103Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212732.xml 2023-01-11T21:44:28.6087470Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6087643Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6088016Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6088207Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6088226Z 2023-01-11T21:44:28.6088333Z Running tests... 2023-01-11T21:44:28.6088574Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6088884Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6089202Z test_batch_isend_irecv_mixed_backend_err (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.002s) 2023-01-11T21:44:28.6089223Z 2023-01-11T21:44:28.6089487Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6089600Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6089618Z 2023-01-11T21:44:28.6089727Z OK (skipped=1) 2023-01-11T21:44:28.6089746Z 2023-01-11T21:44:28.6089867Z Generating XML reports... 2023-01-11T21:44:28.6090308Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212734.xml 2023-01-11T21:44:28.6090672Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6090829Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6091203Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6091394Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6091414Z 2023-01-11T21:44:28.6091521Z Running tests... 2023-01-11T21:44:28.6091779Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6092086Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6092335Z test_batch_isend_irecv_nccl (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.003s) 2023-01-11T21:44:28.6092355Z 2023-01-11T21:44:28.6092612Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6092706Z Ran 1 test in 0.003s 2023-01-11T21:44:28.6092741Z 2023-01-11T21:44:28.6092829Z OK (skipped=1) 2023-01-11T21:44:28.6092848Z 2023-01-11T21:44:28.6092970Z Generating XML reports... 2023-01-11T21:44:28.6093409Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212737.xml 2023-01-11T21:44:28.6093777Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6093950Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6094322Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6094509Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6094529Z 2023-01-11T21:44:28.6094634Z Running tests... 2023-01-11T21:44:28.6094874Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6095181Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6095445Z test_batch_isend_irecv_no_rank_zero_nccl (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.002s) 2023-01-11T21:44:28.6095536Z 2023-01-11T21:44:28.6095803Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6095920Z Ran 1 test in 0.003s 2023-01-11T21:44:28.6095939Z 2023-01-11T21:44:28.6096047Z OK (skipped=1) 2023-01-11T21:44:28.6096066Z 2023-01-11T21:44:28.6096188Z Generating XML reports... 2023-01-11T21:44:28.6096812Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212739.xml 2023-01-11T21:44:28.6097195Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6097353Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6097725Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6097915Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6097939Z 2023-01-11T21:44:28.6098047Z Running tests... 2023-01-11T21:44:28.6098306Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6098694Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6098961Z test_batch_isend_irecv_op_err (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.002s) 2023-01-11T21:44:28.6098981Z 2023-01-11T21:44:28.6099244Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6099356Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6099376Z 2023-01-11T21:44:28.6099463Z OK (skipped=1) 2023-01-11T21:44:28.6099482Z 2023-01-11T21:44:28.6099606Z Generating XML reports... 2023-01-11T21:44:28.6100046Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212742.xml 2023-01-11T21:44:28.6100409Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6100589Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6100970Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6101158Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6101178Z 2023-01-11T21:44:28.6101284Z Running tests... 2023-01-11T21:44:28.6101526Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6101831Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6102091Z test_batch_isend_irecv_op_list_err (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.002s) 2023-01-11T21:44:28.6102110Z 2023-01-11T21:44:28.6102365Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6102478Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6102497Z 2023-01-11T21:44:28.6102606Z OK (skipped=1) 2023-01-11T21:44:28.6102625Z 2023-01-11T21:44:28.6102748Z Generating XML reports... 2023-01-11T21:44:28.6103187Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212744.xml 2023-01-11T21:44:28.6103553Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6103709Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6104083Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6104272Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6104291Z 2023-01-11T21:44:28.6104398Z Running tests... 2023-01-11T21:44:28.6104657Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6105042Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6105320Z test_batch_isend_irecv_ring_exchange_nccl (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.002s) 2023-01-11T21:44:28.6105340Z 2023-01-11T21:44:28.6105598Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6105709Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6105728Z 2023-01-11T21:44:28.6105816Z OK (skipped=1) 2023-01-11T21:44:28.6105835Z 2023-01-11T21:44:28.6105961Z Generating XML reports... 2023-01-11T21:44:28.6106397Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212746.xml 2023-01-11T21:44:28.6106763Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6106937Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6107311Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6107504Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6107573Z 2023-01-11T21:44:28.6107689Z Running tests... 2023-01-11T21:44:28.6107933Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6108240Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6108498Z test_batch_isend_irecv_self_nccl (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.002s) 2023-01-11T21:44:28.6108518Z 2023-01-11T21:44:28.6108775Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6108887Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6108906Z 2023-01-11T21:44:28.6109013Z OK (skipped=1) 2023-01-11T21:44:28.6109031Z 2023-01-11T21:44:28.6109155Z Generating XML reports... 2023-01-11T21:44:28.6109595Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212749.xml 2023-01-11T21:44:28.6109959Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6110114Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6110489Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6110677Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6110696Z 2023-01-11T21:44:28.6110802Z Running tests... 2023-01-11T21:44:28.6111060Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6111364Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6111626Z test_batch_isend_irecv_tensor_err (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.002s) 2023-01-11T21:44:28.6111648Z 2023-01-11T21:44:28.6111902Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6112011Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6112030Z 2023-01-11T21:44:28.6112121Z OK (skipped=1) 2023-01-11T21:44:28.6112139Z 2023-01-11T21:44:28.6112259Z Generating XML reports... 2023-01-11T21:44:28.6112692Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212751.xml 2023-01-11T21:44:28.6113052Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6113224Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6113594Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6113780Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6113852Z 2023-01-11T21:44:28.6113963Z Running tests... 2023-01-11T21:44:28.6114228Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6114520Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6114763Z test_broadcast (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6114981Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18077 2023-01-11T21:44:28.6115196Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18078 2023-01-11T21:44:28.6115560Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6115732Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6116101Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6116292Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6116636Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6116854Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6117240Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6117427Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6117671Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6117915Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6118309Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6118701Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6118931Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6119250Z STAGE:2023-01-11 21:27:57 18078:18078 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6119472Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6119796Z STAGE:2023-01-11 21:27:57 18077:18077 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6120069Z [1673472477.939900] [7c5487d9c02b:18077:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6120297Z [1673472479.583163] [7c5487d9c02b:18077:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6120532Z [1673472479.583163] [7c5487d9c02b:18077:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6120808Z [1673472477.941490] [7c5487d9c02b:18078:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6121040Z [1673472479.583697] [7c5487d9c02b:18078:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6121275Z [1673472479.583697] [7c5487d9c02b:18078:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6121823Z STAGE:2023-01-11 21:27:59 18077:18077 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:27:59 18078:18078 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6121844Z 2023-01-11T21:44:28.6122175Z STAGE:2023-01-11 21:27:59 18078:18078 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6122577Z STAGE:2023-01-11 21:27:59 18077:18077 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6122909Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6123234Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6123564Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6123885Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6124226Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6124568Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6124893Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6125197Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6125575Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6125908Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6126250Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6126590Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6126915Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6127233Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6127559Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6127885Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6128211Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6128552Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6128874Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6129193Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6129519Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6130068Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6130092Z 2023-01-11T21:44:28.6130427Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6130749Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6131063Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6131392Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6131697Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6132033Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6132371Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6132751Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6133071Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6133403Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6133726Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6134064Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6134401Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6134700Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6135016Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6135349Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6135714Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6136063Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6136400Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6137051Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6137381Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6137709Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6138020Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6138364Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6138707Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6139028Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6139345Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6139675Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6140000Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6140338Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6140675Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6140981Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6141306Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6141631Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6141954Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6142290Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6142625Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6142945Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6143259Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6143681Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6143991Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6144332Z STAGE:2023-01-11 21:28:00 18077:18077 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6144667Z STAGE:2023-01-11 21:28:00 18078:18078 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6144768Z ok (6.520s) 2023-01-11T21:44:28.6144788Z 2023-01-11T21:44:28.6145051Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6145163Z Ran 1 test in 6.520s 2023-01-11T21:44:28.6145183Z 2023-01-11T21:44:28.6145274Z OK 2023-01-11T21:44:28.6145293Z 2023-01-11T21:44:28.6145416Z Generating XML reports... 2023-01-11T21:44:28.6145843Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212754.xml 2023-01-11T21:44:28.6146279Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6146467Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6146846Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6147037Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6147058Z 2023-01-11T21:44:28.6147168Z Running tests... 2023-01-11T21:44:28.6147426Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6147734Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6148015Z test_broadcast_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Gloo and Nccl backend supports CUDA allReduce (0.002s) 2023-01-11T21:44:28.6148039Z 2023-01-11T21:44:28.6148277Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6148388Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6148409Z 2023-01-11T21:44:28.6148517Z OK (skipped=1) 2023-01-11T21:44:28.6148535Z 2023-01-11T21:44:28.6148658Z Generating XML reports... 2023-01-11T21:44:28.6149095Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212803.xml 2023-01-11T21:44:28.6149457Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6149632Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6150008Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6150195Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6150218Z 2023-01-11T21:44:28.6150307Z Running tests... 2023-01-11T21:44:28.6150567Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6150875Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6151136Z test_broadcast_full_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6151354Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18224 2023-01-11T21:44:28.6151570Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18225 2023-01-11T21:44:28.6151935Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6152109Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6152466Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6152710Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6153076Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6153250Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6153671Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6153860Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6154105Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6154349Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6154749Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6155126Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6155403Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6155650Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:44:28.6155873Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6156107Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:44:28.6156502Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.6156834Z STAGE:2023-01-11 21:28:09 18224:18224 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6157224Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.6157556Z STAGE:2023-01-11 21:28:09 18225:18225 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6157813Z [1673472489.429504] [7c5487d9c02b:18224:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6158045Z [1673472491.081638] [7c5487d9c02b:18224:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6158280Z [1673472491.081638] [7c5487d9c02b:18224:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6158552Z [1673472489.449898] [7c5487d9c02b:18225:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6158779Z [1673472491.058486] [7c5487d9c02b:18225:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6159014Z [1673472491.058486] [7c5487d9c02b:18225:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6159563Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6159583Z 2023-01-11T21:44:28.6159927Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6160269Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6160591Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6160907Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6161216Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6161616Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6161962Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6162304Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6162624Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6162940Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6163271Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6163593Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6163932Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6164259Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6164623Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6164950Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6165276Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6165597Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6165935Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6166275Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6166599Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6166918Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6167229Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6167550Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6167888Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6168221Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6168544Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6168859Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6169189Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6169520Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6169859Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6170183Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6170504Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6170823Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6171151Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6171700Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6171769Z 2023-01-11T21:44:28.6172121Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6172443Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6172767Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6173094Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6173416Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6173738Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6174075Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6174402Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6174800Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6175135Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6175459Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6175799Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6176135Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6176452Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6176948Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6177286Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6177622Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6177960Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6178303Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6178625Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6178941Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6179268Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6179576Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6179918Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6180256Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6180576Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6180895Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6181221Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6181548Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6181887Z STAGE:2023-01-11 21:28:11 18224:18224 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6182313Z STAGE:2023-01-11 21:28:11 18225:18225 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6182397Z ok (6.607s) 2023-01-11T21:44:28.6182440Z 2023-01-11T21:44:28.6182687Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6182800Z Ran 1 test in 6.607s 2023-01-11T21:44:28.6182820Z 2023-01-11T21:44:28.6182911Z OK 2023-01-11T21:44:28.6182930Z 2023-01-11T21:44:28.6183055Z Generating XML reports... 2023-01-11T21:44:28.6183499Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212805.xml 2023-01-11T21:44:28.6183863Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6184041Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6184416Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6184591Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6184611Z 2023-01-11T21:44:28.6184720Z Running tests... 2023-01-11T21:44:28.6185040Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6185361Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6185616Z test_broadcast_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6185836Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18338 2023-01-11T21:44:28.6186053Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18339 2023-01-11T21:44:28.6186418Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6186573Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6186953Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6187145Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6187507Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6187679Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6188046Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6188233Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6188476Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6188717Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6189096Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6189493Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6189723Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6189950Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6190108Z skip: Skipped due to small world size. (4.214s) 2023-01-11T21:44:28.6190128Z 2023-01-11T21:44:28.6190391Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6190502Z Ran 1 test in 4.214s 2023-01-11T21:44:28.6190522Z 2023-01-11T21:44:28.6190630Z OK (skipped=1) 2023-01-11T21:44:28.6190649Z 2023-01-11T21:44:28.6190771Z Generating XML reports... 2023-01-11T21:44:28.6191196Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212814.xml 2023-01-11T21:44:28.6191629Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6191807Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6192181Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6192367Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6192386Z 2023-01-11T21:44:28.6192493Z Running tests... 2023-01-11T21:44:28.6192754Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6193061Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6193300Z test_broadcast_multigpu (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6193520Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18441 2023-01-11T21:44:28.6193734Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18442 2023-01-11T21:44:28.6194142Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6194320Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6194699Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6194889Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6195248Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6195422Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6195772Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6195961Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6196206Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6196447Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6196840Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6197228Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6197455Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6197681Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6198451Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1478: UserWarning: torch.distributed.broadcast_multigpu will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#multi-gpu-collective-functions 2023-01-11T21:44:28.6198551Z warnings.warn( 2023-01-11T21:44:28.6199310Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1478: UserWarning: torch.distributed.broadcast_multigpu will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#multi-gpu-collective-functions 2023-01-11T21:44:28.6199421Z warnings.warn( 2023-01-11T21:44:28.6199691Z [1673472506.728638] [7c5487d9c02b:18441:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6199917Z [1673472506.742397] [7c5487d9c02b:18441:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6200208Z [1673472506.742397] [7c5487d9c02b:18441:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6200482Z [1673472506.728606] [7c5487d9c02b:18442:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6200709Z [1673472506.742400] [7c5487d9c02b:18442:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6200942Z [1673472506.742400] [7c5487d9c02b:18442:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6201043Z ok (6.137s) 2023-01-11T21:44:28.6201063Z 2023-01-11T21:44:28.6201313Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6201425Z Ran 1 test in 6.138s 2023-01-11T21:44:28.6201444Z 2023-01-11T21:44:28.6201535Z OK 2023-01-11T21:44:28.6201554Z 2023-01-11T21:44:28.6201677Z Generating XML reports... 2023-01-11T21:44:28.6202125Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212821.xml 2023-01-11T21:44:28.6202540Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6202722Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6203099Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6203287Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6203307Z 2023-01-11T21:44:28.6203398Z Running tests... 2023-01-11T21:44:28.6203660Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6203965Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6204224Z test_broadcast_object_list (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6204970Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/82847 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.630s) 2023-01-11T21:44:28.6204991Z 2023-01-11T21:44:28.6205252Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6205365Z Ran 1 test in 1.630s 2023-01-11T21:44:28.6205384Z 2023-01-11T21:44:28.6205494Z OK (skipped=1) 2023-01-11T21:44:28.6205513Z 2023-01-11T21:44:28.6205642Z Generating XML reports... 2023-01-11T21:44:28.6206060Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212830.xml 2023-01-11T21:44:28.6206425Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6206604Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6206981Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6207170Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6207189Z 2023-01-11T21:44:28.6207296Z Running tests... 2023-01-11T21:44:28.6207556Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6207863Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6208172Z test_compute_bucket_assignment_by_size_sparse_error_with_logger (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6208911Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/85012 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.625s) 2023-01-11T21:44:28.6208982Z 2023-01-11T21:44:28.6209234Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6209347Z Ran 1 test in 1.625s 2023-01-11T21:44:28.6209366Z 2023-01-11T21:44:28.6209474Z OK (skipped=1) 2023-01-11T21:44:28.6209493Z 2023-01-11T21:44:28.6209616Z Generating XML reports... 2023-01-11T21:44:28.6210058Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212834.xml 2023-01-11T21:44:28.6210423Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6210598Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6210975Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6211167Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6211186Z 2023-01-11T21:44:28.6211277Z Running tests... 2023-01-11T21:44:28.6211581Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6211898Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6212211Z test_compute_bucket_assignment_by_size_sparse_error_without_logger (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6212946Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/85339 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.649s) 2023-01-11T21:44:28.6212967Z 2023-01-11T21:44:28.6213223Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6213337Z Ran 1 test in 1.650s 2023-01-11T21:44:28.6213356Z 2023-01-11T21:44:28.6213463Z OK (skipped=1) 2023-01-11T21:44:28.6213482Z 2023-01-11T21:44:28.6213605Z Generating XML reports... 2023-01-11T21:44:28.6214047Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212838.xml 2023-01-11T21:44:28.6214396Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6214569Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6214942Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6215129Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6215149Z 2023-01-11T21:44:28.6215255Z Running tests... 2023-01-11T21:44:28.6215513Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6215822Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6216098Z test_ddp_apply_optim_in_backward (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6216297Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18657 2023-01-11T21:44:28.6216511Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18658 2023-01-11T21:44:28.6217124Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6217301Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6217680Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6217868Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6218229Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6218491Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6218870Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6219040Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6219284Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6219525Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6219922Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6220311Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6220539Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6220766Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6221590Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:738: UserWarning: DDP + apply_optim_in_backward will currently set all parameter gradients to None. If this is not the desired behavior, please set env variable DDP_OVERLAPPED_OPTIM_SET_GRADS_TO_NONE=0, and manually setgradients to None/zero as desired. 2023-01-11T21:44:28.6221713Z warnings.warn( 2023-01-11T21:44:28.6222485Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:738: UserWarning: DDP + apply_optim_in_backward will currently set all parameter gradients to None. If this is not the desired behavior, please set env variable DDP_OVERLAPPED_OPTIM_SET_GRADS_TO_NONE=0, and manually setgradients to None/zero as desired. 2023-01-11T21:44:28.6222579Z warnings.warn( 2023-01-11T21:44:28.6222856Z [1673472527.920646] [7c5487d9c02b:18657:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6223090Z [1673472527.934204] [7c5487d9c02b:18657:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6223327Z [1673472527.934204] [7c5487d9c02b:18657:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6223596Z [1673472527.927471] [7c5487d9c02b:18658:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6223820Z [1673472527.940669] [7c5487d9c02b:18658:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6224054Z [1673472527.940669] [7c5487d9c02b:18658:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6224294Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6224526Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6224760Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6224972Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6225195Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6225420Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6225643Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6225866Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6225967Z ok (8.013s) 2023-01-11T21:44:28.6225987Z 2023-01-11T21:44:28.6226327Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6226441Z Ran 1 test in 8.013s 2023-01-11T21:44:28.6226461Z 2023-01-11T21:44:28.6226535Z OK 2023-01-11T21:44:28.6226553Z 2023-01-11T21:44:28.6226682Z Generating XML reports... 2023-01-11T21:44:28.6227127Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212842.xml 2023-01-11T21:44:28.6227493Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6227671Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6228046Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6228237Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6228257Z 2023-01-11T21:44:28.6228365Z Running tests... 2023-01-11T21:44:28.6228612Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6228921Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6229278Z test_ddp_apply_optim_in_backward_grad_as_bucket_view_false (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6229505Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18775 2023-01-11T21:44:28.6229723Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18776 2023-01-11T21:44:28.6230088Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6230263Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6230630Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6230822Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6231191Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6231367Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6231737Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6231921Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6232147Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6232387Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6232781Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6233170Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6233402Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6233632Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6234400Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:738: UserWarning: DDP + apply_optim_in_backward will currently set all parameter gradients to None. If this is not the desired behavior, please set env variable DDP_OVERLAPPED_OPTIM_SET_GRADS_TO_NONE=0, and manually setgradients to None/zero as desired. 2023-01-11T21:44:28.6234516Z warnings.warn( 2023-01-11T21:44:28.6235268Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:738: UserWarning: DDP + apply_optim_in_backward will currently set all parameter gradients to None. If this is not the desired behavior, please set env variable DDP_OVERLAPPED_OPTIM_SET_GRADS_TO_NONE=0, and manually setgradients to None/zero as desired. 2023-01-11T21:44:28.6235431Z warnings.warn( 2023-01-11T21:44:28.6235709Z [1673472538.350848] [7c5487d9c02b:18776:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6235923Z [1673472538.364324] [7c5487d9c02b:18776:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6236160Z [1673472538.364324] [7c5487d9c02b:18776:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6236428Z [1673472538.341831] [7c5487d9c02b:18775:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6236655Z [1673472538.355361] [7c5487d9c02b:18775:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6236888Z [1673472538.355361] [7c5487d9c02b:18775:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6237123Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6237393Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6237634Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6237861Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6237945Z ok (7.195s) 2023-01-11T21:44:28.6237965Z 2023-01-11T21:44:28.6238235Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6238347Z Ran 1 test in 7.196s 2023-01-11T21:44:28.6238367Z 2023-01-11T21:44:28.6238459Z OK 2023-01-11T21:44:28.6238477Z 2023-01-11T21:44:28.6238600Z Generating XML reports... 2023-01-11T21:44:28.6239043Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212853.xml 2023-01-11T21:44:28.6239416Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6239596Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6239969Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6240140Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6240159Z 2023-01-11T21:44:28.6240267Z Running tests... 2023-01-11T21:44:28.6240526Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6240836Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6241128Z test_ddp_apply_optim_in_backward_ignored_params (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6241349Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18893 2023-01-11T21:44:28.6241561Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18894 2023-01-11T21:44:28.6241929Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6242088Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6242464Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6242654Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6243017Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6243190Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6243556Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6243794Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6244043Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6244284Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6244668Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6245061Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6245288Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6245513Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6246321Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:738: UserWarning: DDP + apply_optim_in_backward will currently set all parameter gradients to None. If this is not the desired behavior, please set env variable DDP_OVERLAPPED_OPTIM_SET_GRADS_TO_NONE=0, and manually setgradients to None/zero as desired. 2023-01-11T21:44:28.6246445Z warnings.warn( 2023-01-11T21:44:28.6247215Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:738: UserWarning: DDP + apply_optim_in_backward will currently set all parameter gradients to None. If this is not the desired behavior, please set env variable DDP_OVERLAPPED_OPTIM_SET_GRADS_TO_NONE=0, and manually setgradients to None/zero as desired. 2023-01-11T21:44:28.6247326Z warnings.warn( 2023-01-11T21:44:28.6247598Z [1673472548.138069] [7c5487d9c02b:18893:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6247832Z [1673472548.151662] [7c5487d9c02b:18893:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6248053Z [1673472548.151662] [7c5487d9c02b:18893:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6248325Z [1673472548.144735] [7c5487d9c02b:18894:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6248551Z [1673472548.158196] [7c5487d9c02b:18894:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6248785Z [1673472548.158196] [7c5487d9c02b:18894:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6248887Z ok (7.940s) 2023-01-11T21:44:28.6248907Z 2023-01-11T21:44:28.6249174Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6249286Z Ran 1 test in 7.940s 2023-01-11T21:44:28.6249305Z 2023-01-11T21:44:28.6249398Z OK 2023-01-11T21:44:28.6249419Z 2023-01-11T21:44:28.6249543Z Generating XML reports... 2023-01-11T21:44:28.6249968Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212902.xml 2023-01-11T21:44:28.6250339Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6250521Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6250893Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6251083Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6251102Z 2023-01-11T21:44:28.6251210Z Running tests... 2023-01-11T21:44:28.6251469Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6251778Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6252091Z test_ddp_broadcast_buffer (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6252291Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19013 2023-01-11T21:44:28.6252512Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19014 2023-01-11T21:44:28.6252881Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6253056Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6253435Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6253673Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6254034Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6254207Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6254560Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6254797Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6255044Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6255284Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6255683Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6256075Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6256301Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6256528Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6257004Z [1673472558.583255] [7c5487d9c02b:19014:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6257232Z [1673472558.596561] [7c5487d9c02b:19014:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6257450Z [1673472558.596561] [7c5487d9c02b:19014:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6257716Z [1673472558.573593] [7c5487d9c02b:19013:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6257941Z [1673472558.587206] [7c5487d9c02b:19013:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6258176Z [1673472558.587206] [7c5487d9c02b:19013:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6258415Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6258652Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6258754Z ok (6.533s) 2023-01-11T21:44:28.6258774Z 2023-01-11T21:44:28.6259046Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6259158Z Ran 1 test in 6.533s 2023-01-11T21:44:28.6259178Z 2023-01-11T21:44:28.6259253Z OK 2023-01-11T21:44:28.6259272Z 2023-01-11T21:44:28.6259395Z Generating XML reports... 2023-01-11T21:44:28.6259841Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212913.xml 2023-01-11T21:44:28.6260208Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6260384Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6260849Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6261046Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6261065Z 2023-01-11T21:44:28.6261175Z Running tests... 2023-01-11T21:44:28.6261418Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6274246Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6274588Z test_ddp_broadcast_buffer_via_hook (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6274817Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19131 2023-01-11T21:44:28.6275035Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19132 2023-01-11T21:44:28.6275433Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6275616Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6276127Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6276334Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6276703Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6276877Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6277250Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6277439Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6277665Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6277914Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6278318Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6278713Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6278940Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6279165Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6279439Z [1673472567.803210] [7c5487d9c02b:19131:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6279668Z [1673472567.816904] [7c5487d9c02b:19131:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6279910Z [1673472567.816904] [7c5487d9c02b:19131:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6280182Z [1673472567.804528] [7c5487d9c02b:19132:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6280390Z [1673472567.817958] [7c5487d9c02b:19132:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6280626Z [1673472567.817958] [7c5487d9c02b:19132:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6280861Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6281096Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6281325Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6281615Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6281718Z ok (6.745s) 2023-01-11T21:44:28.6281740Z 2023-01-11T21:44:28.6282020Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6282134Z Ran 1 test in 6.745s 2023-01-11T21:44:28.6282154Z 2023-01-11T21:44:28.6282228Z OK 2023-01-11T21:44:28.6282246Z 2023-01-11T21:44:28.6282371Z Generating XML reports... 2023-01-11T21:44:28.6282821Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212922.xml 2023-01-11T21:44:28.6283189Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6283364Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6283738Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6283929Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6283949Z 2023-01-11T21:44:28.6284058Z Running tests... 2023-01-11T21:44:28.6284366Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6284687Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6284954Z test_ddp_buffer_hook_allreduce (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6285698Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78641 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.632s) 2023-01-11T21:44:28.6285718Z 2023-01-11T21:44:28.6285977Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6286093Z Ran 1 test in 1.633s 2023-01-11T21:44:28.6286112Z 2023-01-11T21:44:28.6286219Z OK (skipped=1) 2023-01-11T21:44:28.6286237Z 2023-01-11T21:44:28.6286361Z Generating XML reports... 2023-01-11T21:44:28.6286807Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212931.xml 2023-01-11T21:44:28.6287171Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6287328Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6287700Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6287889Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6287908Z 2023-01-11T21:44:28.6288016Z Running tests... 2023-01-11T21:44:28.6288276Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6288586Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6288880Z test_ddp_buffer_hook_allreduce_return_future (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6289614Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77261 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.605s) 2023-01-11T21:44:28.6289634Z 2023-01-11T21:44:28.6289892Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6289984Z Ran 1 test in 1.605s 2023-01-11T21:44:28.6290022Z 2023-01-11T21:44:28.6290110Z OK (skipped=1) 2023-01-11T21:44:28.6290128Z 2023-01-11T21:44:28.6290251Z Generating XML reports... 2023-01-11T21:44:28.6290691Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212935.xml 2023-01-11T21:44:28.6291116Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6291297Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6291671Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6291861Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6291881Z 2023-01-11T21:44:28.6291990Z Running tests... 2023-01-11T21:44:28.6292228Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6292536Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6292819Z test_ddp_build_debug_param_to_name_mapping (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6293041Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19317 2023-01-11T21:44:28.6293256Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19318 2023-01-11T21:44:28.6293665Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6293844Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6294221Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6294413Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6294757Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6294928Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6295295Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6295486Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6295733Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6295976Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6296372Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6297009Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6297246Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6297451Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6297669Z 2023-01-11T21:44:28.6297949Z [1673472585.358541] [7c5487d9c02b:19317:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6298183Z [1673472585.372394] [7c5487d9c02b:19317:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6298420Z [1673472585.372394] [7c5487d9c02b:19317:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6298690Z [1673472585.360732] [7c5487d9c02b:19318:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6298916Z [1673472585.374205] [7c5487d9c02b:19318:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6299148Z [1673472585.374205] [7c5487d9c02b:19318:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6299358Z ok (6.175s) 2023-01-11T21:44:28.6299378Z 2023-01-11T21:44:28.6299630Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6299742Z Ran 1 test in 6.176s 2023-01-11T21:44:28.6299761Z 2023-01-11T21:44:28.6299858Z OK 2023-01-11T21:44:28.6299877Z 2023-01-11T21:44:28.6300002Z Generating XML reports... 2023-01-11T21:44:28.6300448Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212940.xml 2023-01-11T21:44:28.6300814Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6300990Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6301365Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6301536Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6301578Z 2023-01-11T21:44:28.6301669Z Running tests... 2023-01-11T21:44:28.6301929Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6302298Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6302608Z test_ddp_build_debug_param_to_name_mapping_requires_grad (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6302828Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19431 2023-01-11T21:44:28.6303044Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19432 2023-01-11T21:44:28.6303413Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6303588Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6303946Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6304143Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6304503Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6304676Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6305043Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6305227Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6305474Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6305718Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6306112Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6306492Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6306722Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6306949Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6307222Z [1673472594.175475] [7c5487d9c02b:19431:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6307451Z [1673472594.189261] [7c5487d9c02b:19431:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6307684Z [1673472594.189261] [7c5487d9c02b:19431:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6307953Z [1673472594.181158] [7c5487d9c02b:19432:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6308233Z [1673472594.194183] [7c5487d9c02b:19432:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6308470Z [1673472594.194183] [7c5487d9c02b:19432:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6308555Z ok (6.262s) 2023-01-11T21:44:28.6308593Z 2023-01-11T21:44:28.6308846Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6308956Z Ran 1 test in 6.262s 2023-01-11T21:44:28.6308975Z 2023-01-11T21:44:28.6309067Z OK 2023-01-11T21:44:28.6309086Z 2023-01-11T21:44:28.6309210Z Generating XML reports... 2023-01-11T21:44:28.6309653Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212948.xml 2023-01-11T21:44:28.6310019Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6310198Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6310618Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6310796Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6310835Z 2023-01-11T21:44:28.6310925Z Running tests... 2023-01-11T21:44:28.6311190Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6311500Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6311761Z test_ddp_comm_hook_logging (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6311982Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19545 2023-01-11T21:44:28.6312199Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19546 2023-01-11T21:44:28.6312573Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6312729Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6313109Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6313298Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6313657Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6313829Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6314197Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6314384Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6314627Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6314872Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6315252Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6315645Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6315872Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6316098Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6316330Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6316564Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6316834Z [1673472602.821411] [7c5487d9c02b:19545:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6317125Z [1673472602.834999] [7c5487d9c02b:19545:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6317362Z [1673472602.834999] [7c5487d9c02b:19545:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6317632Z [1673472602.824912] [7c5487d9c02b:19546:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6317840Z [1673472602.838274] [7c5487d9c02b:19546:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6318075Z [1673472602.838274] [7c5487d9c02b:19546:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6318179Z ok (6.542s) 2023-01-11T21:44:28.6318201Z 2023-01-11T21:44:28.6318469Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6318582Z Ran 1 test in 6.542s 2023-01-11T21:44:28.6318601Z 2023-01-11T21:44:28.6318693Z OK 2023-01-11T21:44:28.6318757Z 2023-01-11T21:44:28.6318887Z Generating XML reports... 2023-01-11T21:44:28.6319332Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212957.xml 2023-01-11T21:44:28.6319679Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6319855Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6320229Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6320420Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6320439Z 2023-01-11T21:44:28.6320554Z Running tests... 2023-01-11T21:44:28.6320811Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6321117Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6321405Z test_ddp_control_flow_different_across_ranks (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6321624Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19663 2023-01-11T21:44:28.6321821Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19664 2023-01-11T21:44:28.6322186Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6322358Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6322731Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6322923Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6323282Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6323455Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6323829Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6324013Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6324238Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6324479Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6324877Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6325268Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6325580Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6325812Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6326086Z [1673472611.857229] [7c5487d9c02b:19663:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6326313Z [1673472611.870855] [7c5487d9c02b:19663:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6326549Z [1673472611.870855] [7c5487d9c02b:19663:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6326802Z [1673472611.850363] [7c5487d9c02b:19664:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6327032Z [1673472611.871020] [7c5487d9c02b:19664:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6327309Z [1673472611.871020] [7c5487d9c02b:19664:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6328084Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T21:44:28.6328189Z ok (6.546s) 2023-01-11T21:44:28.6328209Z 2023-01-11T21:44:28.6328479Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6328596Z Ran 1 test in 6.547s 2023-01-11T21:44:28.6328615Z 2023-01-11T21:44:28.6328708Z OK 2023-01-11T21:44:28.6328727Z 2023-01-11T21:44:28.6328851Z Generating XML reports... 2023-01-11T21:44:28.6329300Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213006.xml 2023-01-11T21:44:28.6329669Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6329826Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6330203Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6330393Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6330412Z 2023-01-11T21:44:28.6330519Z Running tests... 2023-01-11T21:44:28.6330779Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6331091Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6331371Z test_ddp_control_flow_same_across_ranks (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6332112Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78235 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.637s) 2023-01-11T21:44:28.6332132Z 2023-01-11T21:44:28.6332391Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6332484Z Ran 1 test in 1.637s 2023-01-11T21:44:28.6332521Z 2023-01-11T21:44:28.6332610Z OK (skipped=1) 2023-01-11T21:44:28.6332629Z 2023-01-11T21:44:28.6332754Z Generating XML reports... 2023-01-11T21:44:28.6333194Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213015.xml 2023-01-11T21:44:28.6333619Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6333794Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6334171Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6334361Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6334380Z 2023-01-11T21:44:28.6334488Z Running tests... 2023-01-11T21:44:28.6334799Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6335107Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6335362Z test_ddp_create_graph (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6335585Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19815 2023-01-11T21:44:28.6335802Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19816 2023-01-11T21:44:28.6336222Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6336401Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6337057Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6337249Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6337600Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6337772Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6338144Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6338338Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6338586Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6338829Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6339226Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6339618Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6339827Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6340053Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6340327Z [1673472623.807182] [7c5487d9c02b:19815:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6340565Z [1673472625.213281] [7c5487d9c02b:19815:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6340801Z [1673472625.213281] [7c5487d9c02b:19815:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6341685Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T21:44:28.6341955Z [1673472623.809701] [7c5487d9c02b:19816:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6342271Z [1673472625.221049] [7c5487d9c02b:19816:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6342508Z [1673472625.221049] [7c5487d9c02b:19816:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6343383Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T21:44:28.6344594Z /opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py:197: UserWarning: Using backward() with create_graph=True will create a reference cycle between the parameter and its gradient which can cause a memory leak. We recommend using autograd.grad when creating the graph to avoid this. If you have to use this function, make sure to reset the .grad fields of your parameters to None after use to break the cycle and avoid the leak. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/engine.cpp:1134.) 2023-01-11T21:44:28.6344836Z Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2023-01-11T21:44:28.6345985Z /opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py:197: UserWarning: Using backward() with create_graph=True will create a reference cycle between the parameter and its gradient which can cause a memory leak. We recommend using autograd.grad when creating the graph to avoid this. If you have to use this function, make sure to reset the .grad fields of your parameters to None after use to break the cycle and avoid the leak. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/engine.cpp:1134.) 2023-01-11T21:44:28.6346216Z Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2023-01-11T21:44:28.6346456Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6346693Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6347572Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T21:44:28.6348448Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T21:44:28.6349321Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T21:44:28.6350181Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T21:44:28.6351043Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T21:44:28.6351964Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T21:44:28.6352824Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T21:44:28.6353786Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T21:44:28.6354659Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T21:44:28.6355517Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T21:44:28.6355627Z ok (6.044s) 2023-01-11T21:44:28.6355647Z 2023-01-11T21:44:28.6355916Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6356010Z Ran 1 test in 6.044s 2023-01-11T21:44:28.6356030Z 2023-01-11T21:44:28.6356121Z OK 2023-01-11T21:44:28.6356140Z 2023-01-11T21:44:28.6356263Z Generating XML reports... 2023-01-11T21:44:28.6356706Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213019.xml 2023-01-11T21:44:28.6357074Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6357249Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6357626Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6357822Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6357842Z 2023-01-11T21:44:28.6357932Z Running tests... 2023-01-11T21:44:28.6358198Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6358507Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6358753Z test_ddp_device (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6359487Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77324 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.616s) 2023-01-11T21:44:28.6359508Z 2023-01-11T21:44:28.6359769Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6359973Z Ran 1 test in 1.616s 2023-01-11T21:44:28.6359992Z 2023-01-11T21:44:28.6360099Z OK (skipped=1) 2023-01-11T21:44:28.6360117Z 2023-01-11T21:44:28.6360242Z Generating XML reports... 2023-01-11T21:44:28.6360691Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213028.xml 2023-01-11T21:44:28.6361040Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6361216Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6361590Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6361779Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6361798Z 2023-01-11T21:44:28.6361907Z Running tests... 2023-01-11T21:44:28.6362166Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6362477Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6362792Z test_ddp_forward_backward_hook (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6363000Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19963 2023-01-11T21:44:28.6363216Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19964 2023-01-11T21:44:28.6363587Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6363761Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6364136Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6364325Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6364681Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6364858Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6365229Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6365400Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6365645Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6365888Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6366284Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6366675Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6366907Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6367134Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6367917Z /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1331: UserWarning: Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior. 2023-01-11T21:44:28.6368244Z warnings.warn("Using a non-full backward hook when the forward contains multiple autograd Nodes " 2023-01-11T21:44:28.6369020Z /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1331: UserWarning: Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior. 2023-01-11T21:44:28.6369407Z warnings.warn("Using a non-full backward hook when the forward contains multiple autograd Nodes " 2023-01-11T21:44:28.6369665Z [1673472637.959167] [7c5487d9c02b:19964:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6369897Z [1673472637.972650] [7c5487d9c02b:19964:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6370133Z [1673472637.972650] [7c5487d9c02b:19964:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6370401Z [1673472637.949343] [7c5487d9c02b:19963:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6370626Z [1673472637.963374] [7c5487d9c02b:19963:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6370866Z [1673472637.963374] [7c5487d9c02b:19963:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6370968Z ok (6.640s) 2023-01-11T21:44:28.6371031Z 2023-01-11T21:44:28.6371306Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6371420Z Ran 1 test in 6.641s 2023-01-11T21:44:28.6371440Z 2023-01-11T21:44:28.6371514Z OK 2023-01-11T21:44:28.6371532Z 2023-01-11T21:44:28.6371657Z Generating XML reports... 2023-01-11T21:44:28.6372101Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213032.xml 2023-01-11T21:44:28.6372469Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6372644Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6373021Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6373217Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6373236Z 2023-01-11T21:44:28.6373348Z Running tests... 2023-01-11T21:44:28.6373609Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6373901Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6374170Z test_ddp_grad_div_uneven_inputs (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6374907Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78685 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.610s) 2023-01-11T21:44:28.6374928Z 2023-01-11T21:44:28.6375189Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6375304Z Ran 1 test in 1.610s 2023-01-11T21:44:28.6375323Z 2023-01-11T21:44:28.6375430Z OK (skipped=1) 2023-01-11T21:44:28.6375449Z 2023-01-11T21:44:28.6375573Z Generating XML reports... 2023-01-11T21:44:28.6376012Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213041.xml 2023-01-11T21:44:28.6376379Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6376763Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6377162Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6377353Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6377373Z 2023-01-11T21:44:28.6377484Z Running tests... 2023-01-11T21:44:28.6377747Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6378146Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6378422Z test_ddp_hook_parity_allreduce (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6379161Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77293 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.633s) 2023-01-11T21:44:28.6379182Z 2023-01-11T21:44:28.6379441Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6379552Z Ran 1 test in 1.633s 2023-01-11T21:44:28.6379571Z 2023-01-11T21:44:28.6379660Z OK (skipped=1) 2023-01-11T21:44:28.6379678Z 2023-01-11T21:44:28.6379802Z Generating XML reports... 2023-01-11T21:44:28.6380241Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213045.xml 2023-01-11T21:44:28.6380675Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6380857Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6381237Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6381428Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6381448Z 2023-01-11T21:44:28.6381557Z Running tests... 2023-01-11T21:44:28.6381818Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6382107Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6382395Z test_ddp_hook_parity_allreduce_process_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6382617Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20179 2023-01-11T21:44:28.6382831Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20180 2023-01-11T21:44:28.6383200Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6383375Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6383751Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6383940Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6384278Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6384451Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6384818Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6385008Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6385255Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6385495Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6385892Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6386287Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6386515Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6386733Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:44:28.6387017Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6387256Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:44:28.6387657Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.6388049Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.6388323Z [1673472655.440393] [7c5487d9c02b:20180:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6388553Z [1673472655.453955] [7c5487d9c02b:20180:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6388789Z [1673472655.453955] [7c5487d9c02b:20180:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6389062Z [1673472655.433239] [7c5487d9c02b:20179:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6389333Z [1673472655.446523] [7c5487d9c02b:20179:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6389555Z [1673472655.446523] [7c5487d9c02b:20179:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6389789Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6390023Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6390253Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6390483Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6390594Z ok (6.926s) 2023-01-11T21:44:28.6390614Z 2023-01-11T21:44:28.6390882Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6390994Z Ran 1 test in 6.926s 2023-01-11T21:44:28.6391013Z 2023-01-11T21:44:28.6391090Z OK 2023-01-11T21:44:28.6391129Z 2023-01-11T21:44:28.6391233Z Generating XML reports... 2023-01-11T21:44:28.6391677Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213050.xml 2023-01-11T21:44:28.6392048Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6392223Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6392600Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6392789Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6392808Z 2023-01-11T21:44:28.6392920Z Running tests... 2023-01-11T21:44:28.6393176Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6393468Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6393741Z test_ddp_hook_parity_post_localSGD (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6393957Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20297 2023-01-11T21:44:28.6394172Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20298 2023-01-11T21:44:28.6394537Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6394710Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6395087Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6395327Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6395671Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6395848Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6396221Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6396408Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6396651Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6396892Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6397288Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6397679Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6397911Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6398213Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Local SGD will be started after 10 iterations 2023-01-11T21:44:28.6398441Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6398707Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Local SGD will be started after 10 iterations 2023-01-11T21:44:28.6398983Z [1673472664.800984] [7c5487d9c02b:20298:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6399212Z [1673472664.814407] [7c5487d9c02b:20298:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6399449Z [1673472664.814407] [7c5487d9c02b:20298:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6399724Z [1673472664.797869] [7c5487d9c02b:20297:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6399953Z [1673472664.811814] [7c5487d9c02b:20297:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6400186Z [1673472664.811814] [7c5487d9c02b:20297:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6400420Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6400637Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6400867Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6401096Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6401376Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Start to apply local SGD after 10 iterations. 2023-01-11T21:44:28.6401653Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Start to apply local SGD after 10 iterations. 2023-01-11T21:44:28.6401923Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Local SGD will be started after 10 iterations 2023-01-11T21:44:28.6402192Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Local SGD will be started after 10 iterations 2023-01-11T21:44:28.6402424Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6402652Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6402862Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6403144Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6403417Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Start to apply local SGD after 10 iterations. 2023-01-11T21:44:28.6403696Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Start to apply local SGD after 10 iterations. 2023-01-11T21:44:28.6403968Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Local SGD will be started after 1000 iterations 2023-01-11T21:44:28.6404235Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Local SGD will be started after 1000 iterations 2023-01-11T21:44:28.6404468Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6404696Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6404920Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6405132Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6405234Z ok (7.215s) 2023-01-11T21:44:28.6405253Z 2023-01-11T21:44:28.6405572Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6405687Z Ran 1 test in 7.216s 2023-01-11T21:44:28.6405706Z 2023-01-11T21:44:28.6405799Z OK 2023-01-11T21:44:28.6405818Z 2023-01-11T21:44:28.6405942Z Generating XML reports... 2023-01-11T21:44:28.6406387Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213059.xml 2023-01-11T21:44:28.6406754Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6406911Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6407286Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6407480Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6407499Z 2023-01-11T21:44:28.6407607Z Running tests... 2023-01-11T21:44:28.6407871Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6408176Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6408443Z test_ddp_hook_parity_powerSGD (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6409179Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77378 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.622s) 2023-01-11T21:44:28.6409200Z 2023-01-11T21:44:28.6409462Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6409578Z Ran 1 test in 1.622s 2023-01-11T21:44:28.6409597Z 2023-01-11T21:44:28.6409684Z OK (skipped=1) 2023-01-11T21:44:28.6409703Z 2023-01-11T21:44:28.6409827Z Generating XML reports... 2023-01-11T21:44:28.6410271Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213109.xml 2023-01-11T21:44:28.6410636Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6410811Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6411184Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6411375Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6411394Z 2023-01-11T21:44:28.6411502Z Running tests... 2023-01-11T21:44:28.6411743Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6412108Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6412375Z test_ddp_hook_pickling_powerSGD (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6412598Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20449 2023-01-11T21:44:28.6412812Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20450 2023-01-11T21:44:28.6413178Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6413349Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6413720Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6413909Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6414248Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6414422Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6414843Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6415035Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6415280Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6415523Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6415923Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6416314Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6416755Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6417308Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 4; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T21:44:28.6417532Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6418065Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 4; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T21:44:28.6418300Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6418534Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6418806Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:Start to apply PowerSGD after 4 iterations. 2023-01-11T21:44:28.6419074Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:Start to apply PowerSGD after 4 iterations. 2023-01-11T21:44:28.6419370Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:A zero tensor of length 10 that represents local error is created. 2023-01-11T21:44:28.6419660Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:A zero tensor of length 10 that represents local error is created. 2023-01-11T21:44:28.6419983Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:Compression stats: iter 4, total before compression 10, total after compression 10, rate 1.0 2023-01-11T21:44:28.6420304Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:Compression stats: iter 4, total before compression 10, total after compression 10, rate 1.0 2023-01-11T21:44:28.6420712Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:Allocating contiguous memory of length 0 for Ps, and of length 0 for Qs, respectively. 2023-01-11T21:44:28.6421032Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:Allocating contiguous memory of length 0 for Ps, and of length 0 for Qs, respectively. 2023-01-11T21:44:28.6421306Z [1673472678.738671] [7c5487d9c02b:20449:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6421537Z [1673472678.752256] [7c5487d9c02b:20449:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6421773Z [1673472678.752256] [7c5487d9c02b:20449:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6422040Z [1673472678.747883] [7c5487d9c02b:20450:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6422341Z [1673472678.761066] [7c5487d9c02b:20450:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6422586Z [1673472678.761066] [7c5487d9c02b:20450:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6422821Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6423049Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6423133Z ok (6.632s) 2023-01-11T21:44:28.6423171Z 2023-01-11T21:44:28.6423431Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6423541Z Ran 1 test in 6.632s 2023-01-11T21:44:28.6423561Z 2023-01-11T21:44:28.6423653Z OK 2023-01-11T21:44:28.6423673Z 2023-01-11T21:44:28.6423797Z Generating XML reports... 2023-01-11T21:44:28.6424249Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213113.xml 2023-01-11T21:44:28.6424620Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6424796Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6425171Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6425342Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6425361Z 2023-01-11T21:44:28.6425469Z Running tests... 2023-01-11T21:44:28.6425729Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6426035Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6426422Z test_ddp_hook_with_optimizer_parity_adam_optimize_subset_False (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T21:44:28.6426445Z 2023-01-11T21:44:28.6426707Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6426818Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6426837Z 2023-01-11T21:44:28.6426945Z OK (skipped=1) 2023-01-11T21:44:28.6426964Z 2023-01-11T21:44:28.6427087Z Generating XML reports... 2023-01-11T21:44:28.6427506Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213122.xml 2023-01-11T21:44:28.6427873Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6428046Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6428420Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6428669Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6428688Z 2023-01-11T21:44:28.6428796Z Running tests... 2023-01-11T21:44:28.6429064Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6429372Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6429755Z test_ddp_hook_with_optimizer_parity_adam_optimize_subset_True (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T21:44:28.6429777Z 2023-01-11T21:44:28.6430017Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6430128Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6430147Z 2023-01-11T21:44:28.6430254Z OK (skipped=1) 2023-01-11T21:44:28.6430272Z 2023-01-11T21:44:28.6430395Z Generating XML reports... 2023-01-11T21:44:28.6430834Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213125.xml 2023-01-11T21:44:28.6431247Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6431424Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6431802Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6431991Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6432010Z 2023-01-11T21:44:28.6432098Z Running tests... 2023-01-11T21:44:28.6432356Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6432663Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6433103Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_False_optimize_subset_False (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T21:44:28.6433127Z 2023-01-11T21:44:28.6433387Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6433498Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6433517Z 2023-01-11T21:44:28.6433625Z OK (skipped=1) 2023-01-11T21:44:28.6433644Z 2023-01-11T21:44:28.6433766Z Generating XML reports... 2023-01-11T21:44:28.6434205Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213127.xml 2023-01-11T21:44:28.6434552Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6434726Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6435100Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6435292Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6435311Z 2023-01-11T21:44:28.6435418Z Running tests... 2023-01-11T21:44:28.6435678Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6435988Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6436428Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_False_optimize_subset_True (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T21:44:28.6436448Z 2023-01-11T21:44:28.6436707Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6436800Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6436837Z 2023-01-11T21:44:28.6436926Z OK (skipped=1) 2023-01-11T21:44:28.6436945Z 2023-01-11T21:44:28.6437066Z Generating XML reports... 2023-01-11T21:44:28.6437560Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213129.xml 2023-01-11T21:44:28.6437928Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6438104Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6438479Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6438667Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6438687Z 2023-01-11T21:44:28.6438795Z Running tests... 2023-01-11T21:44:28.6439037Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6439342Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6439778Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_True_optimize_subset_False (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T21:44:28.6439802Z 2023-01-11T21:44:28.6440104Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6440219Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6440238Z 2023-01-11T21:44:28.6440346Z OK (skipped=1) 2023-01-11T21:44:28.6440365Z 2023-01-11T21:44:28.6440488Z Generating XML reports... 2023-01-11T21:44:28.6440926Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213132.xml 2023-01-11T21:44:28.6441292Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6441466Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6441823Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6442017Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6442037Z 2023-01-11T21:44:28.6442144Z Running tests... 2023-01-11T21:44:28.6442407Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6442714Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6443151Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_True_optimize_subset_True (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T21:44:28.6443171Z 2023-01-11T21:44:28.6443427Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6443537Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6443556Z 2023-01-11T21:44:28.6443663Z OK (skipped=1) 2023-01-11T21:44:28.6443682Z 2023-01-11T21:44:28.6443790Z Generating XML reports... 2023-01-11T21:44:28.6444223Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213134.xml 2023-01-11T21:44:28.6444589Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6444763Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6445137Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6445326Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6445346Z 2023-01-11T21:44:28.6445453Z Running tests... 2023-01-11T21:44:28.6445711Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6445997Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6446434Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_False_optimize_subset_False (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T21:44:28.6446525Z 2023-01-11T21:44:28.6446772Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6446880Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6446899Z 2023-01-11T21:44:28.6447007Z OK (skipped=1) 2023-01-11T21:44:28.6447025Z 2023-01-11T21:44:28.6447148Z Generating XML reports... 2023-01-11T21:44:28.6447587Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213137.xml 2023-01-11T21:44:28.6447952Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6448126Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6448498Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6448671Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6448690Z 2023-01-11T21:44:28.6448844Z Running tests... 2023-01-11T21:44:28.6449111Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6449419Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6449856Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_False_optimize_subset_True (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T21:44:28.6449876Z 2023-01-11T21:44:28.6450134Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6450245Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6450264Z 2023-01-11T21:44:28.6450371Z OK (skipped=1) 2023-01-11T21:44:28.6450393Z 2023-01-11T21:44:28.6450516Z Generating XML reports... 2023-01-11T21:44:28.6450939Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213139.xml 2023-01-11T21:44:28.6451305Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6451478Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6451852Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6452041Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6452060Z 2023-01-11T21:44:28.6452166Z Running tests... 2023-01-11T21:44:28.6452426Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6452734Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6453177Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_True_optimize_subset_False (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T21:44:28.6453198Z 2023-01-11T21:44:28.6453454Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6453547Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6453566Z 2023-01-11T21:44:28.6453727Z OK (skipped=1) 2023-01-11T21:44:28.6453746Z 2023-01-11T21:44:28.6453871Z Generating XML reports... 2023-01-11T21:44:28.6454314Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213141.xml 2023-01-11T21:44:28.6454679Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6454854Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6455292Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6455488Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6455507Z 2023-01-11T21:44:28.6455596Z Running tests... 2023-01-11T21:44:28.6455854Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6456163Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6456816Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_True_optimize_subset_True (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T21:44:28.6456835Z 2023-01-11T21:44:28.6457109Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6457223Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6457246Z 2023-01-11T21:44:28.6457359Z OK (skipped=1) 2023-01-11T21:44:28.6457377Z 2023-01-11T21:44:28.6457500Z Generating XML reports... 2023-01-11T21:44:28.6458059Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213144.xml 2023-01-11T21:44:28.6458441Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6458598Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6458971Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6459160Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6459180Z 2023-01-11T21:44:28.6459286Z Running tests... 2023-01-11T21:44:28.6459545Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6459853Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6460244Z test_ddp_hook_with_optimizer_parity_sgd_optimize_subset_False (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T21:44:28.6460264Z 2023-01-11T21:44:28.6460524Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6460635Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6460654Z 2023-01-11T21:44:28.6460743Z OK (skipped=1) 2023-01-11T21:44:28.6460761Z 2023-01-11T21:44:28.6460883Z Generating XML reports... 2023-01-11T21:44:28.6461323Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213146.xml 2023-01-11T21:44:28.6461686Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6461860Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6462237Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6462432Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6462454Z 2023-01-11T21:44:28.6462562Z Running tests... 2023-01-11T21:44:28.6462803Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6463105Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6463485Z test_ddp_hook_with_optimizer_parity_sgd_optimize_subset_True (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T21:44:28.6463506Z 2023-01-11T21:44:28.6463763Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6463873Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6463892Z 2023-01-11T21:44:28.6463999Z OK (skipped=1) 2023-01-11T21:44:28.6464086Z 2023-01-11T21:44:28.6464215Z Generating XML reports... 2023-01-11T21:44:28.6464655Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213148.xml 2023-01-11T21:44:28.6465025Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6465181Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6465554Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6465743Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6465762Z 2023-01-11T21:44:28.6465870Z Running tests... 2023-01-11T21:44:28.6466129Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6466435Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6466700Z test_ddp_ignore_params_arg (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6467482Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77325 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.650s) 2023-01-11T21:44:28.6467505Z 2023-01-11T21:44:28.6467772Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6467884Z Ran 1 test in 1.651s 2023-01-11T21:44:28.6467903Z 2023-01-11T21:44:28.6467991Z OK (skipped=1) 2023-01-11T21:44:28.6468011Z 2023-01-11T21:44:28.6468133Z Generating XML reports... 2023-01-11T21:44:28.6468571Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213151.xml 2023-01-11T21:44:28.6468938Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6469118Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6469493Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6469683Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6469702Z 2023-01-11T21:44:28.6469814Z Running tests... 2023-01-11T21:44:28.6470072Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6470360Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6470611Z test_ddp_inference (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6470828Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20997 2023-01-11T21:44:28.6471043Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20998 2023-01-11T21:44:28.6471408Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6471587Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6471963Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6472151Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6472491Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6472664Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6473032Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6473216Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6473515Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6473762Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6474162Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6474553Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6474780Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6474987Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6475259Z [1673472720.781256] [7c5487d9c02b:20997:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6475492Z [1673472720.795006] [7c5487d9c02b:20997:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6475776Z [1673472720.795006] [7c5487d9c02b:20997:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6476051Z [1673472720.787287] [7c5487d9c02b:20998:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6476278Z [1673472720.800739] [7c5487d9c02b:20998:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6476513Z [1673472720.800739] [7c5487d9c02b:20998:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6476616Z ok (7.058s) 2023-01-11T21:44:28.6476636Z 2023-01-11T21:44:28.6476904Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6476998Z Ran 1 test in 7.059s 2023-01-11T21:44:28.6477040Z 2023-01-11T21:44:28.6477114Z OK 2023-01-11T21:44:28.6477134Z 2023-01-11T21:44:28.6477256Z Generating XML reports... 2023-01-11T21:44:28.6477701Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213155.xml 2023-01-11T21:44:28.6478068Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6478244Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6478618Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6478808Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6478827Z 2023-01-11T21:44:28.6478935Z Running tests... 2023-01-11T21:44:28.6479178Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6479484Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6479755Z test_ddp_join_model_equivalence (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6479974Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21111 2023-01-11T21:44:28.6480190Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21112 2023-01-11T21:44:28.6480556Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6480729Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6481101Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6481271Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6481630Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6481869Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6482248Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6482438Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6482680Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6482922Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6483319Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6483711Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6483921Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6484150Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6484426Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6484664Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6484935Z [1673472730.874486] [7c5487d9c02b:21112:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6485170Z [1673472730.887887] [7c5487d9c02b:21112:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6485407Z [1673472730.887887] [7c5487d9c02b:21112:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6485676Z [1673472730.872057] [7c5487d9c02b:21111:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6485906Z [1673472730.885165] [7c5487d9c02b:21111:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6486142Z [1673472730.885165] [7c5487d9c02b:21111:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6486529Z /opt/conda/lib/python3.10/tempfile.py:860: ResourceWarning: Implicitly cleaning up 2023-01-11T21:44:28.6486692Z _warnings.warn(warn_message, ResourceWarning) 2023-01-11T21:44:28.6487087Z /opt/conda/lib/python3.10/tempfile.py:860: ResourceWarning: Implicitly cleaning up 2023-01-11T21:44:28.6487250Z _warnings.warn(warn_message, ResourceWarning) 2023-01-11T21:44:28.6487352Z ok (6.672s) 2023-01-11T21:44:28.6487372Z 2023-01-11T21:44:28.6487635Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6487752Z Ran 1 test in 6.673s 2023-01-11T21:44:28.6487772Z 2023-01-11T21:44:28.6487863Z OK 2023-01-11T21:44:28.6487882Z 2023-01-11T21:44:28.6487988Z Generating XML reports... 2023-01-11T21:44:28.6488431Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213205.xml 2023-01-11T21:44:28.6488799Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6488974Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6489348Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6489538Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6489558Z 2023-01-11T21:44:28.6489667Z Running tests... 2023-01-11T21:44:28.6489926Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6490289Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6490531Z test_ddp_logging_data_cpu (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6490754Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21229 2023-01-11T21:44:28.6490969Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21230 2023-01-11T21:44:28.6491336Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6491509Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6491884Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6492072Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6492430Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6492586Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6493000Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6493191Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6493438Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6493681Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6494078Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6494472Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6494699Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6494928Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6495145Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6495377Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6495651Z [1673472738.197122] [7c5487d9c02b:21229:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6495881Z [1673472739.640623] [7c5487d9c02b:21229:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6496116Z [1673472739.640623] [7c5487d9c02b:21229:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6496384Z [1673472738.198737] [7c5487d9c02b:21230:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6496791Z [1673472739.600586] [7c5487d9c02b:21230:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6497037Z [1673472739.600586] [7c5487d9c02b:21230:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6497142Z ok (6.211s) 2023-01-11T21:44:28.6497163Z 2023-01-11T21:44:28.6497432Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6497528Z Ran 1 test in 6.211s 2023-01-11T21:44:28.6497547Z 2023-01-11T21:44:28.6497641Z OK 2023-01-11T21:44:28.6497661Z 2023-01-11T21:44:28.6497783Z Generating XML reports... 2023-01-11T21:44:28.6498227Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213214.xml 2023-01-11T21:44:28.6498594Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6498854Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6499237Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6499428Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6499448Z 2023-01-11T21:44:28.6499538Z Running tests... 2023-01-11T21:44:28.6499800Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6500113Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6500374Z test_ddp_logging_data_gpu (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6500590Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21373 2023-01-11T21:44:28.6500804Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21374 2023-01-11T21:44:28.6501174Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6501407Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6501789Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6501960Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6502323Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6502496Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6502862Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6503050Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6503297Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6503539Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6503938Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6504311Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6504538Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6504762Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6505034Z [1673472748.347943] [7c5487d9c02b:21373:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6505265Z [1673472748.361078] [7c5487d9c02b:21373:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6505506Z [1673472748.361078] [7c5487d9c02b:21373:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6505739Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6506008Z [1673472748.351191] [7c5487d9c02b:21374:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6506236Z [1673472748.364607] [7c5487d9c02b:21374:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6506468Z [1673472748.364607] [7c5487d9c02b:21374:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6506683Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6506841Z ok (6.632s) 2023-01-11T21:44:28.6506861Z 2023-01-11T21:44:28.6507130Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6507242Z Ran 1 test in 6.632s 2023-01-11T21:44:28.6507261Z 2023-01-11T21:44:28.6507359Z OK 2023-01-11T21:44:28.6507379Z 2023-01-11T21:44:28.6507505Z Generating XML reports... 2023-01-11T21:44:28.6507949Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213223.xml 2023-01-11T21:44:28.6508316Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6508489Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6508843Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6509031Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6509054Z 2023-01-11T21:44:28.6509162Z Running tests... 2023-01-11T21:44:28.6509420Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6509774Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6510063Z test_ddp_model_diff_num_params_across_ranks (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6510283Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21491 2023-01-11T21:44:28.6510498Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21492 2023-01-11T21:44:28.6510849Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6511023Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6511396Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6511588Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6511948Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6512121Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6512489Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6512675Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6512918Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6513141Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6513535Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6513931Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6514160Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6514385Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6514621Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:44:28.6514860Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:44:28.6515250Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.6515636Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.6515855Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T21:44:28.6516141Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T21:44:28.6516540Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:44:28.6516930Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:44:28.6517203Z [1673472757.459679] [7c5487d9c02b:21491:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6517433Z [1673472757.473583] [7c5487d9c02b:21491:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6517668Z [1673472757.473583] [7c5487d9c02b:21491:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6517936Z [1673472757.461886] [7c5487d9c02b:21492:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6518218Z [1673472757.474491] [7c5487d9c02b:21492:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6518457Z [1673472757.474491] [7c5487d9c02b:21492:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6518541Z ok (6.156s) 2023-01-11T21:44:28.6518561Z 2023-01-11T21:44:28.6518828Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6518939Z Ran 1 test in 6.156s 2023-01-11T21:44:28.6518959Z 2023-01-11T21:44:28.6519050Z OK 2023-01-11T21:44:28.6519069Z 2023-01-11T21:44:28.6519193Z Generating XML reports... 2023-01-11T21:44:28.6519637Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213232.xml 2023-01-11T21:44:28.6520011Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6520187Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6520550Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6520740Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6520760Z 2023-01-11T21:44:28.6520868Z Running tests... 2023-01-11T21:44:28.6521130Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6521440Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6521717Z test_ddp_model_diff_shape_across_ranks (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6521934Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21611 2023-01-11T21:44:28.6522152Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21612 2023-01-11T21:44:28.6522516Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6522675Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6523047Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6523236Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6523594Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6523763Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6524130Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6524319Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6524609Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6524838Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6525236Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6525627Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6525854Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6526080Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6526316Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:44:28.6526554Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:44:28.6526949Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.6527386Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.6527627Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T21:44:28.6527844Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T21:44:28.6528232Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:44:28.6528623Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:44:28.6528893Z [1673472766.253232] [7c5487d9c02b:21612:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6529130Z [1673472766.266507] [7c5487d9c02b:21612:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6529367Z [1673472766.266507] [7c5487d9c02b:21612:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6529745Z [1673472776.621888] [7c5487d9c02b:21612:0] tag_match.c:62 UCX WARN unexpected tag-receive descriptor 0x37d6e2c0 was not matched 2023-01-11T21:44:28.6530013Z [1673472766.244635] [7c5487d9c02b:21611:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6530239Z [1673472766.258032] [7c5487d9c02b:21611:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6530471Z [1673472766.258032] [7c5487d9c02b:21611:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6530767Z [1673472776.590812] [7c5487d9c02b:21611:1] ucc_schedule.h:189 UCC WARN timeout 10 sec. has expired on req 0x336641c0, seq_num 3, TL_UCP, team_id 1, size 2, rank 0, ctx_rank 0: Barrier n/a inplace=0 bytes=0 2023-01-11T21:44:28.6531038Z [1673472776.621803] [7c5487d9c02b:21611:0] mpool.c:55 UCX WARN object 0x3379f980 {flags:0x20040 recv length 0 host memory} was not returned to mpool ucp_requests 2023-01-11T21:44:28.6531143Z ok (16.163s) 2023-01-11T21:44:28.6531163Z 2023-01-11T21:44:28.6531428Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6531541Z Ran 1 test in 16.164s 2023-01-11T21:44:28.6531560Z 2023-01-11T21:44:28.6531651Z OK 2023-01-11T21:44:28.6531670Z 2023-01-11T21:44:28.6531794Z Generating XML reports... 2023-01-11T21:44:28.6532234Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213240.xml 2023-01-11T21:44:28.6532652Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6532814Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6533193Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6533383Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6533403Z 2023-01-11T21:44:28.6533511Z Running tests... 2023-01-11T21:44:28.6533775Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6534083Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6534390Z test_ddp_multiple_nested_unused_params_err_ignore_params (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6534609Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21731 2023-01-11T21:44:28.6534828Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21732 2023-01-11T21:44:28.6535217Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6535399Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6535775Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6535964Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6536322Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6536493Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6537127Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6537319Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6537547Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6537793Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6538191Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6538583Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6538809Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6539029Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6539304Z [1673472784.933121] [7c5487d9c02b:21731:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6539539Z [1673472784.946739] [7c5487d9c02b:21731:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6539775Z [1673472784.946739] [7c5487d9c02b:21731:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6540042Z [1673472784.939482] [7c5487d9c02b:21732:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6540252Z [1673472784.952722] [7c5487d9c02b:21732:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6540486Z [1673472784.952722] [7c5487d9c02b:21732:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6540589Z ok (6.939s) 2023-01-11T21:44:28.6540609Z 2023-01-11T21:44:28.6540980Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6541093Z Ran 1 test in 6.939s 2023-01-11T21:44:28.6541113Z 2023-01-11T21:44:28.6541205Z OK 2023-01-11T21:44:28.6541223Z 2023-01-11T21:44:28.6541351Z Generating XML reports... 2023-01-11T21:44:28.6541796Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213259.xml 2023-01-11T21:44:28.6542164Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6542322Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6542697Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6542887Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6542906Z 2023-01-11T21:44:28.6543014Z Running tests... 2023-01-11T21:44:28.6543278Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6543588Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6543937Z test_ddp_multiple_nested_unused_params_error (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6544163Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21849 2023-01-11T21:44:28.6544361Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21850 2023-01-11T21:44:28.6544731Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6544907Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6545282Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6545468Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6545834Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6546010Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6546379Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6546564Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6546788Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6547027Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6547421Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6547814Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6548044Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6548271Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6548544Z [1673472794.355232] [7c5487d9c02b:21849:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6548773Z [1673472794.368651] [7c5487d9c02b:21849:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6549010Z [1673472794.368651] [7c5487d9c02b:21849:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6549260Z [1673472794.356323] [7c5487d9c02b:21850:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6549542Z [1673472794.369590] [7c5487d9c02b:21850:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6549778Z [1673472794.369590] [7c5487d9c02b:21850:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6549882Z ok (6.815s) 2023-01-11T21:44:28.6549902Z 2023-01-11T21:44:28.6550170Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6550281Z Ran 1 test in 6.815s 2023-01-11T21:44:28.6550299Z 2023-01-11T21:44:28.6550391Z OK 2023-01-11T21:44:28.6550410Z 2023-01-11T21:44:28.6550534Z Generating XML reports... 2023-01-11T21:44:28.6550974Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213309.xml 2023-01-11T21:44:28.6551323Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6551499Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6551874Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6552108Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6552128Z 2023-01-11T21:44:28.6552239Z Running tests... 2023-01-11T21:44:28.6552505Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6552815Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6553071Z test_ddp_namedtuple (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6553289Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21967 2023-01-11T21:44:28.6553485Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21968 2023-01-11T21:44:28.6553906Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6554089Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6554470Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6554659Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6555017Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6555189Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6555559Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6555727Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6555969Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6556209Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6556605Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6556994Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6557220Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6557445Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6557715Z [1673472803.950715] [7c5487d9c02b:21968:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6557944Z [1673472803.963126] [7c5487d9c02b:21968:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6558237Z [1673472803.963126] [7c5487d9c02b:21968:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6558489Z [1673472803.942436] [7c5487d9c02b:21967:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6558717Z [1673472803.956034] [7c5487d9c02b:21967:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6558951Z [1673472803.956034] [7c5487d9c02b:21967:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6559054Z ok (6.815s) 2023-01-11T21:44:28.6559073Z 2023-01-11T21:44:28.6559338Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6559449Z Ran 1 test in 6.815s 2023-01-11T21:44:28.6559468Z 2023-01-11T21:44:28.6559562Z OK 2023-01-11T21:44:28.6559581Z 2023-01-11T21:44:28.6559706Z Generating XML reports... 2023-01-11T21:44:28.6560150Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213318.xml 2023-01-11T21:44:28.6560544Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6560725Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6561104Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6561293Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6561313Z 2023-01-11T21:44:28.6561421Z Running tests... 2023-01-11T21:44:28.6561679Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6561986Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6562246Z test_ddp_new_tensor_in_fwd (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6562450Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22081 2023-01-11T21:44:28.6562668Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22082 2023-01-11T21:44:28.6563034Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6563207Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6563582Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6563771Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6564130Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6564300Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6564674Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6564842Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6565088Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6565328Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6565722Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6566113Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6566340Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6566565Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6566887Z [1673472813.199185] [7c5487d9c02b:22082:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6567120Z [1673472813.212518] [7c5487d9c02b:22082:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6567336Z [1673472813.212518] [7c5487d9c02b:22082:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6568108Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T21:44:28.6568382Z [1673472813.196130] [7c5487d9c02b:22081:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6568653Z [1673472813.209772] [7c5487d9c02b:22081:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6568893Z [1673472813.209772] [7c5487d9c02b:22081:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6569662Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T21:44:28.6569904Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6570140Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6570243Z ok (6.605s) 2023-01-11T21:44:28.6570263Z 2023-01-11T21:44:28.6570533Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6570644Z Ran 1 test in 6.605s 2023-01-11T21:44:28.6570664Z 2023-01-11T21:44:28.6570757Z OK 2023-01-11T21:44:28.6570776Z 2023-01-11T21:44:28.6570901Z Generating XML reports... 2023-01-11T21:44:28.6571324Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213327.xml 2023-01-11T21:44:28.6571693Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6571872Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6572249Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6572442Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6572461Z 2023-01-11T21:44:28.6572569Z Running tests... 2023-01-11T21:44:28.6572830Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6573137Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6573398Z test_ddp_new_tensor_in_fwd_static_graph (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6574138Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78338 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.628s) 2023-01-11T21:44:28.6574226Z 2023-01-11T21:44:28.6574473Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6574587Z Ran 1 test in 1.628s 2023-01-11T21:44:28.6574607Z 2023-01-11T21:44:28.6574713Z OK (skipped=1) 2023-01-11T21:44:28.6574732Z 2023-01-11T21:44:28.6574859Z Generating XML reports... 2023-01-11T21:44:28.6575299Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213337.xml 2023-01-11T21:44:28.6575664Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6575838Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6576211Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6576381Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6576422Z 2023-01-11T21:44:28.6576512Z Running tests... 2023-01-11T21:44:28.6577015Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6577403Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6577688Z test_ddp_profiling_autograd_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6578430Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77342 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.621s) 2023-01-11T21:44:28.6578451Z 2023-01-11T21:44:28.6578712Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6578825Z Ran 1 test in 1.622s 2023-01-11T21:44:28.6578845Z 2023-01-11T21:44:28.6578957Z OK (skipped=1) 2023-01-11T21:44:28.6578976Z 2023-01-11T21:44:28.6579099Z Generating XML reports... 2023-01-11T21:44:28.6579526Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213341.xml 2023-01-11T21:44:28.6579893Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6580068Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6580443Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6580633Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6580652Z 2023-01-11T21:44:28.6580759Z Running tests... 2023-01-11T21:44:28.6581016Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6581322Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6581581Z test_ddp_profiling_torch_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6581803Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22267 2023-01-11T21:44:28.6582017Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22268 2023-01-11T21:44:28.6582385Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6582558Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6582934Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6583123Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6583482Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6583727Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6584087Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6584275Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6584519Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6584762Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6585158Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6585549Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6585776Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6586004Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6586322Z [1673472830.589149] [7c5487d9c02b:22267:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6586539Z [1673472830.610882] [7c5487d9c02b:22267:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6586774Z [1673472830.610882] [7c5487d9c02b:22267:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6587109Z STAGE:2023-01-11 21:33:51 22267:22267 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6587378Z [1673472830.590205] [7c5487d9c02b:22268:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6587607Z [1673472830.604538] [7c5487d9c02b:22268:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6587849Z [1673472830.604538] [7c5487d9c02b:22268:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6588179Z STAGE:2023-01-11 21:33:51 22268:22268 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6588411Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6588645Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6589185Z STAGE:2023-01-11 21:33:51 22268:22268 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:33:51 22267:22267 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6589206Z 2023-01-11T21:44:28.6589533Z STAGE:2023-01-11 21:33:51 22268:22268 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6589880Z STAGE:2023-01-11 21:33:51 22267:22267 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6590655Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T21:44:28.6591423Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T21:44:28.6591849Z STAGE:2023-01-11 21:33:52 22268:22268 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6592171Z STAGE:2023-01-11 21:33:52 22267:22267 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6592503Z STAGE:2023-01-11 21:33:52 22268:22268 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6592826Z STAGE:2023-01-11 21:33:52 22267:22267 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6593166Z STAGE:2023-01-11 21:33:52 22268:22268 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6593506Z STAGE:2023-01-11 21:33:52 22267:22267 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6593611Z ok (7.320s) 2023-01-11T21:44:28.6593630Z 2023-01-11T21:44:28.6593893Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6594003Z Ran 1 test in 7.320s 2023-01-11T21:44:28.6594023Z 2023-01-11T21:44:28.6594142Z OK 2023-01-11T21:44:28.6594162Z 2023-01-11T21:44:28.6594290Z Generating XML reports... 2023-01-11T21:44:28.6594736Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213345.xml 2023-01-11T21:44:28.6595104Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6595279Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6595657Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6595847Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6595871Z 2023-01-11T21:44:28.6595982Z Running tests... 2023-01-11T21:44:28.6596223Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6596533Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6596796Z test_ddp_python_error_logged (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6597014Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22389 2023-01-11T21:44:28.6597229Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22390 2023-01-11T21:44:28.6597594Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6597768Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6598146Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6598336Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6598678Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6598853Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6599226Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6599411Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6599655Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6599895Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6600292Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6600686Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6600978Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6601190Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6601465Z [1673472840.441908] [7c5487d9c02b:22390:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6601694Z [1673472840.455215] [7c5487d9c02b:22390:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6601931Z [1673472840.455215] [7c5487d9c02b:22390:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6602199Z [1673472840.434654] [7c5487d9c02b:22389:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6602429Z [1673472840.448330] [7c5487d9c02b:22389:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6602708Z [1673472840.448330] [7c5487d9c02b:22389:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6602817Z ok (6.095s) 2023-01-11T21:44:28.6602837Z 2023-01-11T21:44:28.6603101Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6603196Z Ran 1 test in 6.095s 2023-01-11T21:44:28.6603233Z 2023-01-11T21:44:28.6603308Z OK 2023-01-11T21:44:28.6603327Z 2023-01-11T21:44:28.6603451Z Generating XML reports... 2023-01-11T21:44:28.6603894Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213355.xml 2023-01-11T21:44:28.6604259Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6604438Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6604814Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6605006Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6605025Z 2023-01-11T21:44:28.6605135Z Running tests... 2023-01-11T21:44:28.6605375Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6605684Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6605957Z test_ddp_returns_tensor_with_no_grad (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6606695Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78595 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.633s) 2023-01-11T21:44:28.6606718Z 2023-01-11T21:44:28.6606979Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6607092Z Ran 1 test in 1.633s 2023-01-11T21:44:28.6607112Z 2023-01-11T21:44:28.6607218Z OK (skipped=1) 2023-01-11T21:44:28.6607237Z 2023-01-11T21:44:28.6607361Z Generating XML reports... 2023-01-11T21:44:28.6607806Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213403.xml 2023-01-11T21:44:28.6608170Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6608327Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6608702Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6608892Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6608961Z 2023-01-11T21:44:28.6609073Z Running tests... 2023-01-11T21:44:28.6609333Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6609646Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6609927Z test_ddp_shared_grad_acc_unused_params (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6610145Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22537 2023-01-11T21:44:28.6610341Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22538 2023-01-11T21:44:28.6610706Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6610880Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6611252Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6611444Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6611845Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6612022Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6612392Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6612578Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6612802Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6613043Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6613440Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6613834Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6614066Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6614291Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6615190Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1911: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2023-01-11T21:44:28.6615302Z warnings.warn( 2023-01-11T21:44:28.6616206Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1911: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2023-01-11T21:44:28.6616320Z warnings.warn( 2023-01-11T21:44:28.6616707Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6616957Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6617231Z [1673472853.285111] [7c5487d9c02b:22537:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6617462Z [1673472853.298753] [7c5487d9c02b:22537:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6617702Z [1673472853.298753] [7c5487d9c02b:22537:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6618064Z [1673472853.293922] [7c5487d9c02b:22538:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6618288Z [1673472853.307159] [7c5487d9c02b:22538:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6618521Z [1673472853.307159] [7c5487d9c02b:22538:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6618625Z ok (6.648s) 2023-01-11T21:44:28.6618645Z 2023-01-11T21:44:28.6618920Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6619014Z Ran 1 test in 6.649s 2023-01-11T21:44:28.6619033Z 2023-01-11T21:44:28.6619126Z OK 2023-01-11T21:44:28.6619145Z 2023-01-11T21:44:28.6619270Z Generating XML reports... 2023-01-11T21:44:28.6619712Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213408.xml 2023-01-11T21:44:28.6620084Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6620320Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6620707Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6620897Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6620917Z 2023-01-11T21:44:28.6621006Z Running tests... 2023-01-11T21:44:28.6621267Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6621573Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6621847Z test_ddp_static_graph_nested_types (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6622587Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77625 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.587s) 2023-01-11T21:44:28.6622608Z 2023-01-11T21:44:28.6622865Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6622976Z Ran 1 test in 1.587s 2023-01-11T21:44:28.6622996Z 2023-01-11T21:44:28.6623102Z OK (skipped=1) 2023-01-11T21:44:28.6623121Z 2023-01-11T21:44:28.6623245Z Generating XML reports... 2023-01-11T21:44:28.6623683Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213417.xml 2023-01-11T21:44:28.6624031Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6624206Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6624583Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6624774Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6624793Z 2023-01-11T21:44:28.6624900Z Running tests... 2023-01-11T21:44:28.6625159Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6625467Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6625733Z test_ddp_sync_bn_training_vs_eval (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6625951Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22689 2023-01-11T21:44:28.6626149Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22690 2023-01-11T21:44:28.6626514Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6626747Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6627129Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6627318Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6627676Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6627849Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6628217Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6628386Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6628629Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6628875Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6629315Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6629713Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6629939Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6630159Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6630433Z [1673472866.616838] [7c5487d9c02b:22690:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6630662Z [1673472866.630260] [7c5487d9c02b:22690:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6630885Z [1673472866.630260] [7c5487d9c02b:22690:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6631224Z STAGE:2023-01-11 21:34:27 22690:22690 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6631494Z [1673472866.613680] [7c5487d9c02b:22689:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6631721Z [1673472866.627260] [7c5487d9c02b:22689:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6631955Z [1673472866.627260] [7c5487d9c02b:22689:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6632288Z STAGE:2023-01-11 21:34:27 22689:22689 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6632522Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6632758Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:44:28.6633093Z STAGE:2023-01-11 21:34:27 22690:22690 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6633419Z STAGE:2023-01-11 21:34:27 22689:22689 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6633745Z STAGE:2023-01-11 21:34:27 22689:22689 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6634087Z STAGE:2023-01-11 21:34:27 22690:22690 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6634411Z STAGE:2023-01-11 21:34:27 22689:22689 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6634802Z STAGE:2023-01-11 21:34:28 22689:22689 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6635145Z STAGE:2023-01-11 21:34:28 22689:22689 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6635308Z ok (7.450s) 2023-01-11T21:44:28.6635328Z 2023-01-11T21:44:28.6635594Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6635707Z Ran 1 test in 7.451s 2023-01-11T21:44:28.6635729Z 2023-01-11T21:44:28.6635803Z OK 2023-01-11T21:44:28.6635840Z 2023-01-11T21:44:28.6635945Z Generating XML reports... 2023-01-11T21:44:28.6636389Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213421.xml 2023-01-11T21:44:28.6636754Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6636928Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6637301Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6637489Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6637512Z 2023-01-11T21:44:28.6637619Z Running tests... 2023-01-11T21:44:28.6637878Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6638213Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6638482Z test_ddp_sync_module_states (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6638700Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22811 2023-01-11T21:44:28.6638915Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22812 2023-01-11T21:44:28.6639282Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6639457Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6639826Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6640020Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6640363Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6640536Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6640902Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6641089Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6641331Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6641572Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6641967Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6642361Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6642590Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6642797Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6643069Z [1673472876.599030] [7c5487d9c02b:22811:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6643299Z [1673472876.612613] [7c5487d9c02b:22811:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6643533Z [1673472876.612613] [7c5487d9c02b:22811:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6643799Z [1673472876.606920] [7c5487d9c02b:22812:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6644077Z [1673472876.620274] [7c5487d9c02b:22812:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6644312Z [1673472876.620274] [7c5487d9c02b:22812:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6644415Z ok (6.155s) 2023-01-11T21:44:28.6644434Z 2023-01-11T21:44:28.6644702Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6644813Z Ran 1 test in 6.156s 2023-01-11T21:44:28.6644832Z 2023-01-11T21:44:28.6644905Z OK 2023-01-11T21:44:28.6644924Z 2023-01-11T21:44:28.6645046Z Generating XML reports... 2023-01-11T21:44:28.6645487Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213431.xml 2023-01-11T21:44:28.6645851Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6646029Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6646444Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6646637Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6646657Z 2023-01-11T21:44:28.6646766Z Running tests... 2023-01-11T21:44:28.6647010Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6647318Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6647585Z test_ddp_uneven_input_exception (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6647802Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22925 2023-01-11T21:44:28.6648016Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22926 2023-01-11T21:44:28.6648386Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6648563Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6648935Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6649125Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6649462Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6649633Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6650006Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6650191Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6650434Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6650677Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6651077Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6651469Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6651693Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6651899Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6652170Z [1673472885.299472] [7c5487d9c02b:22926:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6652400Z [1673472885.312838] [7c5487d9c02b:22926:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6652690Z [1673472885.312838] [7c5487d9c02b:22926:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6652959Z [1673472885.294520] [7c5487d9c02b:22925:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6653184Z [1673472885.308180] [7c5487d9c02b:22925:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6653416Z [1673472885.308180] [7c5487d9c02b:22925:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6653520Z ok (6.130s) 2023-01-11T21:44:28.6653539Z 2023-01-11T21:44:28.6653857Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6653952Z Ran 1 test in 6.130s 2023-01-11T21:44:28.6653992Z 2023-01-11T21:44:28.6654070Z OK 2023-01-11T21:44:28.6654089Z 2023-01-11T21:44:28.6654214Z Generating XML reports... 2023-01-11T21:44:28.6654714Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213440.xml 2023-01-11T21:44:28.6655087Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6655263Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6655638Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6655826Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6655845Z 2023-01-11T21:44:28.6655954Z Running tests... 2023-01-11T21:44:28.6656193Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6656501Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6657011Z test_ddp_uneven_input_join_disable (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6657767Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78684 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.650s) 2023-01-11T21:44:28.6657788Z 2023-01-11T21:44:28.6658046Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6658162Z Ran 1 test in 1.650s 2023-01-11T21:44:28.6658180Z 2023-01-11T21:44:28.6658288Z OK (skipped=1) 2023-01-11T21:44:28.6658307Z 2023-01-11T21:44:28.6658431Z Generating XML reports... 2023-01-11T21:44:28.6658869Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213448.xml 2023-01-11T21:44:28.6659234Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6659393Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6659771Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6659959Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6659979Z 2023-01-11T21:44:28.6660086Z Running tests... 2023-01-11T21:44:28.6660344Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6660651Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6660907Z test_ddp_uneven_inputs (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6661639Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/75648 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.635s) 2023-01-11T21:44:28.6661752Z 2023-01-11T21:44:28.6662029Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6662123Z Ran 1 test in 1.635s 2023-01-11T21:44:28.6662142Z 2023-01-11T21:44:28.6662250Z OK (skipped=1) 2023-01-11T21:44:28.6662268Z 2023-01-11T21:44:28.6662392Z Generating XML reports... 2023-01-11T21:44:28.6662832Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213452.xml 2023-01-11T21:44:28.6663196Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6663369Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6663743Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6663937Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6663956Z 2023-01-11T21:44:28.6664062Z Running tests... 2023-01-11T21:44:28.6664361Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6664681Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6664964Z test_ddp_uneven_inputs_stop_iteration_sync_bn (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6665695Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78113 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.612s) 2023-01-11T21:44:28.6665716Z 2023-01-11T21:44:28.6665975Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6666090Z Ran 1 test in 1.612s 2023-01-11T21:44:28.6666109Z 2023-01-11T21:44:28.6666215Z OK (skipped=1) 2023-01-11T21:44:28.6666234Z 2023-01-11T21:44:28.6666360Z Generating XML reports... 2023-01-11T21:44:28.6666793Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213457.xml 2023-01-11T21:44:28.6667156Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6667312Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6667684Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6667872Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6667891Z 2023-01-11T21:44:28.6667998Z Running tests... 2023-01-11T21:44:28.6668253Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6668562Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6668856Z test_ddp_unused_params_rebuild_buckets_exception (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6669074Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23141 2023-01-11T21:44:28.6669271Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23142 2023-01-11T21:44:28.6669638Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6669812Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6670188Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6670377Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6670788Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6670964Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6671336Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6671523Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6671750Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6671991Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6672386Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6672777Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6673008Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6673276Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6673554Z [1673472906.480870] [7c5487d9c02b:23141:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6673782Z [1673472906.494668] [7c5487d9c02b:23141:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6674019Z [1673472906.494668] [7c5487d9c02b:23141:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6674271Z [1673472906.485592] [7c5487d9c02b:23142:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6674499Z [1673472906.498960] [7c5487d9c02b:23142:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6674744Z [1673472906.498960] [7c5487d9c02b:23142:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6674847Z ok (6.632s) 2023-01-11T21:44:28.6674867Z 2023-01-11T21:44:28.6675131Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6675244Z Ran 1 test in 6.633s 2023-01-11T21:44:28.6675264Z 2023-01-11T21:44:28.6675355Z OK 2023-01-11T21:44:28.6675374Z 2023-01-11T21:44:28.6675496Z Generating XML reports... 2023-01-11T21:44:28.6675939Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213501.xml 2023-01-11T21:44:28.6676286Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6676461Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6676840Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6677031Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6677051Z 2023-01-11T21:44:28.6677159Z Running tests... 2023-01-11T21:44:28.6677419Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6677729Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6677996Z test_ddp_zero_output_features (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6678217Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23259 2023-01-11T21:44:28.6678413Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23260 2023-01-11T21:44:28.6678774Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6679003Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6679386Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6679575Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6679938Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6680112Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6680483Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6680651Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6680892Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6681134Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6681576Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6681973Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6682200Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6682575Z /opt/conda/lib/python3.10/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op 2023-01-11T21:44:28.6682826Z warnings.warn("Initializing zero-element tensors is a no-op") 2023-01-11T21:44:28.6683049Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6683394Z /opt/conda/lib/python3.10/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op 2023-01-11T21:44:28.6683651Z warnings.warn("Initializing zero-element tensors is a no-op") 2023-01-11T21:44:28.6683926Z [1673472915.672374] [7c5487d9c02b:23259:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6684156Z [1673472915.686176] [7c5487d9c02b:23259:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6684394Z [1673472915.686176] [7c5487d9c02b:23259:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6684662Z [1673472915.675916] [7c5487d9c02b:23260:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6684887Z [1673472915.689319] [7c5487d9c02b:23260:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6685121Z [1673472915.689319] [7c5487d9c02b:23260:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6685226Z ok (6.119s) 2023-01-11T21:44:28.6685247Z 2023-01-11T21:44:28.6685515Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6685609Z Ran 1 test in 6.119s 2023-01-11T21:44:28.6685629Z 2023-01-11T21:44:28.6685721Z OK 2023-01-11T21:44:28.6685740Z 2023-01-11T21:44:28.6685864Z Generating XML reports... 2023-01-11T21:44:28.6686304Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213510.xml 2023-01-11T21:44:28.6686671Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6686845Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6687219Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6687468Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6687488Z 2023-01-11T21:44:28.6687578Z Running tests... 2023-01-11T21:44:28.6687848Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6688157Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6688412Z test_destroy_full_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6688629Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23373 2023-01-11T21:44:28.6688844Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23374 2023-01-11T21:44:28.6689210Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6689384Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6689740Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6689932Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6690335Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6690510Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6690888Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6691075Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6691318Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6691559Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6691955Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6692334Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6692568Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6692807Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:44:28.6693030Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6693262Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:44:28.6693655Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.6694039Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.6694144Z ok (4.256s) 2023-01-11T21:44:28.6694164Z 2023-01-11T21:44:28.6694426Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6694520Z Ran 1 test in 4.256s 2023-01-11T21:44:28.6694542Z 2023-01-11T21:44:28.6694634Z OK 2023-01-11T21:44:28.6694653Z 2023-01-11T21:44:28.6694776Z Generating XML reports... 2023-01-11T21:44:28.6695219Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213519.xml 2023-01-11T21:44:28.6695585Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6695759Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6696131Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6696318Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6696388Z 2023-01-11T21:44:28.6696501Z Running tests... 2023-01-11T21:44:28.6696991Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6697314Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6697564Z test_destroy_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6697781Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23476 2023-01-11T21:44:28.6697996Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23477 2023-01-11T21:44:28.6698364Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6698538Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6698912Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6699087Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6699523Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6699708Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6700080Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6700268Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6700513Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6700756Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6701148Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6701541Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6701752Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6701990Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:44:28.6702216Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6702454Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:44:28.6702840Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.6703229Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.6703329Z ok (4.345s) 2023-01-11T21:44:28.6703349Z 2023-01-11T21:44:28.6703616Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6703726Z Ran 1 test in 4.345s 2023-01-11T21:44:28.6703746Z 2023-01-11T21:44:28.6703819Z OK 2023-01-11T21:44:28.6703838Z 2023-01-11T21:44:28.6703964Z Generating XML reports... 2023-01-11T21:44:28.6704407Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213525.xml 2023-01-11T21:44:28.6704772Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6704944Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6705318Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6705508Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6705527Z 2023-01-11T21:44:28.6705636Z Running tests... 2023-01-11T21:44:28.6705952Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6706259Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6706538Z test_detect_ddp_is_actually_static (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6707278Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78767 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.628s) 2023-01-11T21:44:28.6707298Z 2023-01-11T21:44:28.6707559Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6707669Z Ran 1 test in 1.629s 2023-01-11T21:44:28.6707688Z 2023-01-11T21:44:28.6707794Z OK (skipped=1) 2023-01-11T21:44:28.6707813Z 2023-01-11T21:44:28.6707936Z Generating XML reports... 2023-01-11T21:44:28.6708377Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213532.xml 2023-01-11T21:44:28.6708789Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6708953Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6709329Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6709520Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6709539Z 2023-01-11T21:44:28.6709647Z Running tests... 2023-01-11T21:44:28.6709905Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6710211Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6710483Z test_different_graph_across_ranks (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6711220Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78748 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.608s) 2023-01-11T21:44:28.6711241Z 2023-01-11T21:44:28.6711497Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6711590Z Ran 1 test in 1.609s 2023-01-11T21:44:28.6711628Z 2023-01-11T21:44:28.6711717Z OK (skipped=1) 2023-01-11T21:44:28.6711735Z 2023-01-11T21:44:28.6711860Z Generating XML reports... 2023-01-11T21:44:28.6712298Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213536.xml 2023-01-11T21:44:28.6712661Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6712839Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6713212Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6713403Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6713422Z 2023-01-11T21:44:28.6713531Z Running tests... 2023-01-11T21:44:28.6713772Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6714078Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6714347Z test_dump_DDP_relevant_env_vars (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6714563Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23647 2023-01-11T21:44:28.6714780Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23648 2023-01-11T21:44:28.6715146Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6715372Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6715753Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6715941Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6716287Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6716461Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6716827Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6717014Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6717257Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6717501Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6717938Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6718335Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6718543Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6718769Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6718872Z ok (4.240s) 2023-01-11T21:44:28.6718891Z 2023-01-11T21:44:28.6719153Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6719264Z Ran 1 test in 4.240s 2023-01-11T21:44:28.6719284Z 2023-01-11T21:44:28.6719375Z OK 2023-01-11T21:44:28.6719397Z 2023-01-11T21:44:28.6719522Z Generating XML reports... 2023-01-11T21:44:28.6719964Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213541.xml 2023-01-11T21:44:28.6720332Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6720488Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6720860Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6721049Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6721068Z 2023-01-11T21:44:28.6721175Z Running tests... 2023-01-11T21:44:28.6721434Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6721739Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6721998Z test_gather (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T21:44:28.6722017Z 2023-01-11T21:44:28.6722277Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6722369Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6722407Z 2023-01-11T21:44:28.6722495Z OK (skipped=1) 2023-01-11T21:44:28.6722514Z 2023-01-11T21:44:28.6722637Z Generating XML reports... 2023-01-11T21:44:28.6723078Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213547.xml 2023-01-11T21:44:28.6723441Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6723613Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6723983Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6724236Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6724255Z 2023-01-11T21:44:28.6724364Z Running tests... 2023-01-11T21:44:28.6724612Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6724919Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6725182Z test_gather_checks (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T21:44:28.6725202Z 2023-01-11T21:44:28.6725458Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6725569Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6725588Z 2023-01-11T21:44:28.6725695Z OK (skipped=1) 2023-01-11T21:44:28.6725714Z 2023-01-11T21:44:28.6725836Z Generating XML reports... 2023-01-11T21:44:28.6726273Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213550.xml 2023-01-11T21:44:28.6726642Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6726797Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6727254Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6727449Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6727469Z 2023-01-11T21:44:28.6727576Z Running tests... 2023-01-11T21:44:28.6727835Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6728144Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6728394Z test_gather_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA gather (0.002s) 2023-01-11T21:44:28.6728415Z 2023-01-11T21:44:28.6728670Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6728788Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6728808Z 2023-01-11T21:44:28.6728896Z OK (skipped=1) 2023-01-11T21:44:28.6728915Z 2023-01-11T21:44:28.6729036Z Generating XML reports... 2023-01-11T21:44:28.6729475Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213552.xml 2023-01-11T21:44:28.6729837Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6730012Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6730384Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6730575Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6730595Z 2023-01-11T21:44:28.6730702Z Running tests... 2023-01-11T21:44:28.6730941Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6731251Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6731522Z test_gather_full_group (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T21:44:28.6731541Z 2023-01-11T21:44:28.6731797Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6731908Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6731927Z 2023-01-11T21:44:28.6732035Z OK (skipped=1) 2023-01-11T21:44:28.6732054Z 2023-01-11T21:44:28.6732176Z Generating XML reports... 2023-01-11T21:44:28.6732612Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213555.xml 2023-01-11T21:44:28.6732973Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6733129Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6733560Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6733748Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6733771Z 2023-01-11T21:44:28.6733879Z Running tests... 2023-01-11T21:44:28.6734138Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6734447Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6734707Z test_gather_group (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T21:44:28.6734727Z 2023-01-11T21:44:28.6734981Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6735093Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6735112Z 2023-01-11T21:44:28.6735200Z OK (skipped=1) 2023-01-11T21:44:28.6735219Z 2023-01-11T21:44:28.6735342Z Generating XML reports... 2023-01-11T21:44:28.6735780Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213557.xml 2023-01-11T21:44:28.6736187Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6736368Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6736965Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6737157Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6737178Z 2023-01-11T21:44:28.6737287Z Running tests... 2023-01-11T21:44:28.6737535Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6737838Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6738100Z test_gather_object (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T21:44:28.6738125Z 2023-01-11T21:44:28.6738386Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6738496Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6738519Z 2023-01-11T21:44:28.6738627Z OK (skipped=1) 2023-01-11T21:44:28.6738646Z 2023-01-11T21:44:28.6738768Z Generating XML reports... 2023-01-11T21:44:28.6739205Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213559.xml 2023-01-11T21:44:28.6739567Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6739724Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6740100Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6740288Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6740311Z 2023-01-11T21:44:28.6740418Z Running tests... 2023-01-11T21:44:28.6740677Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6740986Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6741259Z test_gather_object_subgroup (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T21:44:28.6741278Z 2023-01-11T21:44:28.6741535Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6741648Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6741667Z 2023-01-11T21:44:28.6741755Z OK (skipped=1) 2023-01-11T21:44:28.6741774Z 2023-01-11T21:44:28.6741896Z Generating XML reports... 2023-01-11T21:44:28.6742331Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213602.xml 2023-01-11T21:44:28.6742693Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6742956Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6743336Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6743526Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6743546Z 2023-01-11T21:44:28.6743653Z Running tests... 2023-01-11T21:44:28.6743911Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6744198Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6744442Z test_get_backend (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6744659Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23981 2023-01-11T21:44:28.6744873Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23982 2023-01-11T21:44:28.6745239Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6745474Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6745865Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6746053Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6746394Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6746565Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6746937Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6747125Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6747372Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6747611Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6748010Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6748402Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6748630Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6748850Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:44:28.6749072Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6749304Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:44:28.6749700Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.6750089Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.6750191Z ok (4.345s) 2023-01-11T21:44:28.6750211Z 2023-01-11T21:44:28.6750475Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6750587Z Ran 1 test in 4.345s 2023-01-11T21:44:28.6750606Z 2023-01-11T21:44:28.6750699Z OK 2023-01-11T21:44:28.6750718Z 2023-01-11T21:44:28.6750825Z Generating XML reports... 2023-01-11T21:44:28.6751266Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213604.xml 2023-01-11T21:44:28.6751632Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6751857Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6752235Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6752427Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6752447Z 2023-01-11T21:44:28.6752556Z Running tests... 2023-01-11T21:44:28.6752815Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6753103Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6753375Z test_get_future (__main__.TestDistBackendWithSpawn) ... skip: get_future is only supported on mpi, nccl and gloo (0.002s) 2023-01-11T21:44:28.6753394Z 2023-01-11T21:44:28.6753651Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6753811Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6753831Z 2023-01-11T21:44:28.6753943Z OK (skipped=1) 2023-01-11T21:44:28.6753966Z 2023-01-11T21:44:28.6754090Z Generating XML reports... 2023-01-11T21:44:28.6754531Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213611.xml 2023-01-11T21:44:28.6754948Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6755129Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6755485Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6755671Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6755690Z 2023-01-11T21:44:28.6755798Z Running tests... 2023-01-11T21:44:28.6756059Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6756367Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6756610Z test_get_rank (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6756829Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24117 2023-01-11T21:44:28.6757047Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24118 2023-01-11T21:44:28.6757393Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6757567Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6757938Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6758127Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6758484Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6758655Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6759027Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6759214Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6759458Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6759679Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6760075Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6760467Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6760694Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6760918Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6761075Z ok (4.465s) 2023-01-11T21:44:28.6761095Z 2023-01-11T21:44:28.6761362Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6761478Z Ran 1 test in 4.465s 2023-01-11T21:44:28.6761497Z 2023-01-11T21:44:28.6761590Z OK 2023-01-11T21:44:28.6761609Z 2023-01-11T21:44:28.6761715Z Generating XML reports... 2023-01-11T21:44:28.6762160Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213613.xml 2023-01-11T21:44:28.6762528Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6762700Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6763074Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6763265Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6763287Z 2023-01-11T21:44:28.6763397Z Running tests... 2023-01-11T21:44:28.6763657Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6763988Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6764259Z test_get_rank_size_full_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6764478Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24220 2023-01-11T21:44:28.6764695Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24221 2023-01-11T21:44:28.6765065Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6765238Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6765609Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6765803Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6766165Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6766319Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6766688Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6766874Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6767116Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6767356Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6767752Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6768144Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6768372Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6768613Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:44:28.6768815Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6769045Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:44:28.6769433Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.6769821Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.6769973Z ok (4.351s) 2023-01-11T21:44:28.6769993Z 2023-01-11T21:44:28.6770258Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6770372Z Ran 1 test in 4.352s 2023-01-11T21:44:28.6770392Z 2023-01-11T21:44:28.6770487Z OK 2023-01-11T21:44:28.6770506Z 2023-01-11T21:44:28.6770613Z Generating XML reports... 2023-01-11T21:44:28.6771052Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213620.xml 2023-01-11T21:44:28.6771418Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6771592Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6771964Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6772152Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6772175Z 2023-01-11T21:44:28.6772283Z Running tests... 2023-01-11T21:44:28.6772548Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6772899Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6773145Z test_get_rank_size_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6773360Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24323 2023-01-11T21:44:28.6773574Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24324 2023-01-11T21:44:28.6773941Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6774115Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6774488Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6774680Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6775042Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6775199Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6775567Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6775750Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6775993Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6776233Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6776851Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6777259Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6777493Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6777736Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:44:28.6777937Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6778176Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:44:28.6778569Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.6778957Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.6779059Z ok (4.322s) 2023-01-11T21:44:28.6779079Z 2023-01-11T21:44:28.6779339Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6779535Z Ran 1 test in 4.322s 2023-01-11T21:44:28.6779554Z 2023-01-11T21:44:28.6779646Z OK 2023-01-11T21:44:28.6779665Z 2023-01-11T21:44:28.6779793Z Generating XML reports... 2023-01-11T21:44:28.6780222Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213627.xml 2023-01-11T21:44:28.6780587Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6780764Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6781139Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6781332Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6781352Z 2023-01-11T21:44:28.6781460Z Running tests... 2023-01-11T21:44:28.6781722Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6782033Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6782336Z test_invalid_static_graph (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6782563Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24426 2023-01-11T21:44:28.6782778Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24427 2023-01-11T21:44:28.6783148Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6783322Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6783698Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6783888Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6784249Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6784422Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6784777Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6784966Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6785207Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6785446Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6785839Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6786229Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6786459Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6786687Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6786958Z [1673472999.974406] [7c5487d9c02b:24427:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6787171Z [1673472999.987638] [7c5487d9c02b:24427:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6787409Z [1673472999.987638] [7c5487d9c02b:24427:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6787676Z [1673472999.970059] [7c5487d9c02b:24426:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6787901Z [1673472999.983792] [7c5487d9c02b:24426:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6788210Z [1673472999.983792] [7c5487d9c02b:24426:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6788314Z ok (6.658s) 2023-01-11T21:44:28.6788334Z 2023-01-11T21:44:28.6788600Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6788712Z Ran 1 test in 6.658s 2023-01-11T21:44:28.6788731Z 2023-01-11T21:44:28.6788823Z OK 2023-01-11T21:44:28.6788842Z 2023-01-11T21:44:28.6788946Z Generating XML reports... 2023-01-11T21:44:28.6789387Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213634.xml 2023-01-11T21:44:28.6789754Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6789930Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6790309Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6790498Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6790563Z 2023-01-11T21:44:28.6790676Z Running tests... 2023-01-11T21:44:28.6790939Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6791246Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6791464Z test_irecv (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6791682Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24544 2023-01-11T21:44:28.6791896Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24545 2023-01-11T21:44:28.6792260Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6792438Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6792812Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6793004Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6793363Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6793517Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6793885Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6794068Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6794312Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6794706Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6794949Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6795338Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6795565Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6795791Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6796044Z [1673473007.744503] [7c5487d9c02b:24544:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6796273Z [1673473009.164761] [7c5487d9c02b:24544:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6796508Z [1673473009.164761] [7c5487d9c02b:24544:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6796847Z [1673473007.744526] [7c5487d9c02b:24545:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6797075Z [1673473009.191429] [7c5487d9c02b:24545:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6797308Z [1673473009.191429] [7c5487d9c02b:24545:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6797410Z ok (6.136s) 2023-01-11T21:44:28.6797430Z 2023-01-11T21:44:28.6797700Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6797811Z Ran 1 test in 6.137s 2023-01-11T21:44:28.6797830Z 2023-01-11T21:44:28.6797921Z OK 2023-01-11T21:44:28.6797940Z 2023-01-11T21:44:28.6798047Z Generating XML reports... 2023-01-11T21:44:28.6798488Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213643.xml 2023-01-11T21:44:28.6798907Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6799090Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6799467Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6799658Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6799678Z 2023-01-11T21:44:28.6799789Z Running tests... 2023-01-11T21:44:28.6800049Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6800339Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6800574Z test_isend (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6800798Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24654 2023-01-11T21:44:28.6801012Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24655 2023-01-11T21:44:28.6801378Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6801551Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6801927Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6802117Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6802471Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6802624Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6802993Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6803185Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6803433Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6803677Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6804074Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6804465Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6804693Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6804900Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6805172Z [1673473016.480527] [7c5487d9c02b:24654:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6805459Z [1673473017.921524] [7c5487d9c02b:24654:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6805693Z [1673473017.921524] [7c5487d9c02b:24654:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6805957Z [1673473016.482943] [7c5487d9c02b:24655:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6806177Z [1673473017.889098] [7c5487d9c02b:24655:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6806412Z [1673473017.889098] [7c5487d9c02b:24655:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6806515Z ok (6.163s) 2023-01-11T21:44:28.6806535Z 2023-01-11T21:44:28.6806805Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6806918Z Ran 1 test in 6.164s 2023-01-11T21:44:28.6806937Z 2023-01-11T21:44:28.6807011Z OK 2023-01-11T21:44:28.6807029Z 2023-01-11T21:44:28.6807196Z Generating XML reports... 2023-01-11T21:44:28.6807643Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213652.xml 2023-01-11T21:44:28.6808019Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6808193Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6808567Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6808755Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6808775Z 2023-01-11T21:44:28.6808884Z Running tests... 2023-01-11T21:44:28.6809148Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6809439Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6809708Z test_isend_autograd_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6809927Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24764 2023-01-11T21:44:28.6810143Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24765 2023-01-11T21:44:28.6810509Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6810681Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6811051Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6811237Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6811581Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6811755Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6812121Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6812308Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6812552Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6812794Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6813190Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6813583Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6813864Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6814189Z STAGE:2023-01-11 21:37:05 24765:24765 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6814413Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6814744Z STAGE:2023-01-11 21:37:05 24764:24764 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6815015Z [1673473025.150372] [7c5487d9c02b:24765:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6815248Z [1673473026.765771] [7c5487d9c02b:24765:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6815483Z [1673473026.765771] [7c5487d9c02b:24765:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6815821Z STAGE:2023-01-11 21:37:07 24765:24765 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6816209Z STAGE:2023-01-11 21:37:07 24765:24765 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6816479Z [1673473025.129831] [7c5487d9c02b:24764:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6816928Z [1673473026.794312] [7c5487d9c02b:24764:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6817147Z [1673473026.794312] [7c5487d9c02b:24764:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6817491Z STAGE:2023-01-11 21:37:07 24764:24764 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6817833Z STAGE:2023-01-11 21:37:07 24764:24764 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6817940Z ok (6.735s) 2023-01-11T21:44:28.6817960Z 2023-01-11T21:44:28.6818222Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6818338Z Ran 1 test in 6.736s 2023-01-11T21:44:28.6818358Z 2023-01-11T21:44:28.6818450Z OK 2023-01-11T21:44:28.6818469Z 2023-01-11T21:44:28.6818593Z Generating XML reports... 2023-01-11T21:44:28.6819018Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213701.xml 2023-01-11T21:44:28.6819384Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6819558Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6819931Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6820119Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6820143Z 2023-01-11T21:44:28.6820251Z Running tests... 2023-01-11T21:44:28.6820514Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6820827Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6821086Z test_isend_torch_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6821285Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24878 2023-01-11T21:44:28.6821499Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24879 2023-01-11T21:44:28.6821864Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6822037Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6822409Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6822683Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6823053Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6823227Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6823576Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6823763Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6824007Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6824247Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6824643Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6825037Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6825324Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6825668Z STAGE:2023-01-11 21:37:14 24878:24878 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6825895Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6826201Z STAGE:2023-01-11 21:37:14 24879:24879 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6826475Z [1673473034.456974] [7c5487d9c02b:24878:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6826705Z [1673473036.080898] [7c5487d9c02b:24878:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6826946Z [1673473036.080898] [7c5487d9c02b:24878:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6827280Z STAGE:2023-01-11 21:37:16 24878:24878 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6827546Z [1673473034.476964] [7c5487d9c02b:24879:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6827770Z [1673473036.064909] [7c5487d9c02b:24879:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6828003Z [1673473036.064909] [7c5487d9c02b:24879:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6828339Z STAGE:2023-01-11 21:37:16 24879:24879 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6828683Z STAGE:2023-01-11 21:37:16 24878:24878 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6829010Z STAGE:2023-01-11 21:37:16 24879:24879 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6829112Z ok (6.541s) 2023-01-11T21:44:28.6829136Z 2023-01-11T21:44:28.6829398Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6829509Z Ran 1 test in 6.541s 2023-01-11T21:44:28.6829529Z 2023-01-11T21:44:28.6829620Z OK 2023-01-11T21:44:28.6829638Z 2023-01-11T21:44:28.6829761Z Generating XML reports... 2023-01-11T21:44:28.6830202Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213710.xml 2023-01-11T21:44:28.6830569Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6830744Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6831099Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6831343Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6831363Z 2023-01-11T21:44:28.6831476Z Running tests... 2023-01-11T21:44:28.6831740Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6832050Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6832332Z test_monitored_barrier_allreduce_hang (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6832550Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24992 2023-01-11T21:44:28.6832763Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24993 2023-01-11T21:44:28.6833109Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6833284Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6833663Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6833898Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6834265Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6834439Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6834810Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6834999Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6835241Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6835464Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6835865Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6836261Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6836491Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6836726Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:44:28.6836952Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6837191Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:44:28.6837583Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.6837973Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.6838195Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T21:44:28.6838432Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T21:44:28.6838821Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:44:28.6839208Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:44:28.6839437Z [E ProcessGroupGloo.cpp:138] [Rank 0]: Rank 1 failed to pass monitoredBarrier in 100 ms 2023-01-11T21:44:28.6839539Z ok (22.822s) 2023-01-11T21:44:28.6839559Z 2023-01-11T21:44:28.6839822Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6839934Z Ran 1 test in 22.822s 2023-01-11T21:44:28.6840005Z 2023-01-11T21:44:28.6840099Z OK 2023-01-11T21:44:28.6840119Z 2023-01-11T21:44:28.6840224Z Generating XML reports... 2023-01-11T21:44:28.6840671Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213719.xml 2023-01-11T21:44:28.6841039Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6841215Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6841589Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6841779Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6841799Z 2023-01-11T21:44:28.6841908Z Running tests... 2023-01-11T21:44:28.6842168Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6842457Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6842759Z test_monitored_barrier_allreduce_hang_wait_all_ranks (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6843026Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25113 2023-01-11T21:44:28.6843249Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25114 2023-01-11T21:44:28.6843616Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6843795Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6844170Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6844359Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6844719Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6844876Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6845250Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6845436Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6845678Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6845919Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6846309Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6846698Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6846925Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6847167Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:44:28.6847372Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6847604Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:44:28.6847996Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.6848382Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.6850049Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T21:44:28.6850322Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T21:44:28.6850727Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:44:28.6851215Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:44:28.6851448Z [E ProcessGroupGloo.cpp:2803] [Rank 0]: Rank 1 failed to pass monitoredBarrier in 100 ms 2023-01-11T21:44:28.6851656Z [E ProcessGroupGloo.cpp:138] [Rank 0]: Ranks 1 failed to pass monitoredBarrier in 100 ms 2023-01-11T21:44:28.6851761Z ok (22.930s) 2023-01-11T21:44:28.6851781Z 2023-01-11T21:44:28.6852046Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6852161Z Ran 1 test in 22.930s 2023-01-11T21:44:28.6852180Z 2023-01-11T21:44:28.6852272Z OK 2023-01-11T21:44:28.6852291Z 2023-01-11T21:44:28.6852415Z Generating XML reports... 2023-01-11T21:44:28.6852859Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213744.xml 2023-01-11T21:44:28.6853232Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6853388Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6853859Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6854061Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6854081Z 2023-01-11T21:44:28.6854192Z Running tests... 2023-01-11T21:44:28.6854458Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6854765Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6855172Z test_monitored_barrier_failure_order (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:44:28.6855192Z 2023-01-11T21:44:28.6855454Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6855573Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6855592Z 2023-01-11T21:44:28.6855681Z OK (skipped=1) 2023-01-11T21:44:28.6855699Z 2023-01-11T21:44:28.6855825Z Generating XML reports... 2023-01-11T21:44:28.6856270Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213810.xml 2023-01-11T21:44:28.6856859Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6857043Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6857431Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6857622Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6857641Z 2023-01-11T21:44:28.6857750Z Running tests... 2023-01-11T21:44:28.6858009Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6858303Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6858733Z test_monitored_barrier_gloo (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:44:28.6858754Z 2023-01-11T21:44:28.6859015Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6859128Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6859147Z 2023-01-11T21:44:28.6859254Z OK (skipped=1) 2023-01-11T21:44:28.6859273Z 2023-01-11T21:44:28.6859396Z Generating XML reports... 2023-01-11T21:44:28.6859837Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213812.xml 2023-01-11T21:44:28.6860204Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6860379Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6860838Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6861032Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6861051Z 2023-01-11T21:44:28.6861161Z Running tests... 2023-01-11T21:44:28.6861421Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6861727Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6862144Z test_monitored_barrier_gloo_rank_0_timeout (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:44:28.6862163Z 2023-01-11T21:44:28.6862423Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6862533Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6862552Z 2023-01-11T21:44:28.6862640Z OK (skipped=1) 2023-01-11T21:44:28.6862681Z 2023-01-11T21:44:28.6862786Z Generating XML reports... 2023-01-11T21:44:28.6863227Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213815.xml 2023-01-11T21:44:28.6863653Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6863835Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6864211Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6864403Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6864425Z 2023-01-11T21:44:28.6864534Z Running tests... 2023-01-11T21:44:28.6864795Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6865082Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6865495Z test_monitored_barrier_gloo_subgroup (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:44:28.6865515Z 2023-01-11T21:44:28.6865774Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6865886Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6865905Z 2023-01-11T21:44:28.6866014Z OK (skipped=1) 2023-01-11T21:44:28.6866033Z 2023-01-11T21:44:28.6866155Z Generating XML reports... 2023-01-11T21:44:28.6866588Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213817.xml 2023-01-11T21:44:28.6866953Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6867126Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6867478Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6867669Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6867688Z 2023-01-11T21:44:28.6867796Z Running tests... 2023-01-11T21:44:28.6868055Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6868363Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6868765Z test_monitored_barrier_wait_all_ranks (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:44:28.6868785Z 2023-01-11T21:44:28.6869037Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6869149Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6869169Z 2023-01-11T21:44:28.6869275Z OK (skipped=1) 2023-01-11T21:44:28.6869295Z 2023-01-11T21:44:28.6869400Z Generating XML reports... 2023-01-11T21:44:28.6869837Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213820.xml 2023-01-11T21:44:28.6870292Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6870473Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6870845Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6871035Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6871054Z 2023-01-11T21:44:28.6871165Z Running tests... 2023-01-11T21:44:28.6871420Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6871707Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6872103Z test_nccl_backend_bool_allgather (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'nccl'} (0.002s) 2023-01-11T21:44:28.6872126Z 2023-01-11T21:44:28.6872382Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6872491Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6872511Z 2023-01-11T21:44:28.6872616Z OK (skipped=1) 2023-01-11T21:44:28.6872681Z 2023-01-11T21:44:28.6872809Z Generating XML reports... 2023-01-11T21:44:28.6873243Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213822.xml 2023-01-11T21:44:28.6873609Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6873782Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6874134Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6874327Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6874347Z 2023-01-11T21:44:28.6874460Z Running tests... 2023-01-11T21:44:28.6874715Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6875022Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6875423Z test_nccl_backend_bool_allreduce (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'nccl'} (0.002s) 2023-01-11T21:44:28.6875443Z 2023-01-11T21:44:28.6875704Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6875816Z Ran 1 test in 0.003s 2023-01-11T21:44:28.6875835Z 2023-01-11T21:44:28.6875942Z OK (skipped=1) 2023-01-11T21:44:28.6875961Z 2023-01-11T21:44:28.6876065Z Generating XML reports... 2023-01-11T21:44:28.6876500Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213824.xml 2023-01-11T21:44:28.6876865Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6877043Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6877419Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6877608Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6877628Z 2023-01-11T21:44:28.6877736Z Running tests... 2023-01-11T21:44:28.6877991Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6878297Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6878672Z test_nccl_backend_bool_broadcast (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'nccl'} (0.002s) 2023-01-11T21:44:28.6878691Z 2023-01-11T21:44:28.6878947Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6879057Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6879126Z 2023-01-11T21:44:28.6879236Z OK (skipped=1) 2023-01-11T21:44:28.6879255Z 2023-01-11T21:44:28.6879378Z Generating XML reports... 2023-01-11T21:44:28.6879817Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213827.xml 2023-01-11T21:44:28.6880183Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6880358Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6880729Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6880902Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6880922Z 2023-01-11T21:44:28.6881030Z Running tests... 2023-01-11T21:44:28.6881287Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6881592Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6881988Z test_nccl_backend_bool_reduce (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'nccl'} (0.003s) 2023-01-11T21:44:28.6882008Z 2023-01-11T21:44:28.6882312Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6882428Z Ran 1 test in 0.003s 2023-01-11T21:44:28.6882447Z 2023-01-11T21:44:28.6882556Z OK (skipped=1) 2023-01-11T21:44:28.6882574Z 2023-01-11T21:44:28.6882678Z Generating XML reports... 2023-01-11T21:44:28.6883119Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213829.xml 2023-01-11T21:44:28.6883483Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6883657Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6884032Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6884227Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6884246Z 2023-01-11T21:44:28.6884355Z Running tests... 2023-01-11T21:44:28.6884617Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6884924Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6885198Z test_nccl_high_priority_stream (__main__.TestDistBackendWithSpawn) ... skip: Only NCCL backend supports high priority stream (0.002s) 2023-01-11T21:44:28.6885235Z 2023-01-11T21:44:28.6885474Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6885584Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6885603Z 2023-01-11T21:44:28.6885710Z OK (skipped=1) 2023-01-11T21:44:28.6885729Z 2023-01-11T21:44:28.6885853Z Generating XML reports... 2023-01-11T21:44:28.6886289Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213832.xml 2023-01-11T21:44:28.6886660Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6886835Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6887206Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6887377Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6887396Z 2023-01-11T21:44:28.6887505Z Running tests... 2023-01-11T21:44:28.6887763Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6888067Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6888315Z test_new_subgroups (__main__.TestDistBackendWithSpawn) ... skip: Test requires world size of 4 (0.002s) 2023-01-11T21:44:28.6888386Z 2023-01-11T21:44:28.6888648Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6888761Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6888780Z 2023-01-11T21:44:28.6888887Z OK (skipped=1) 2023-01-11T21:44:28.6888909Z 2023-01-11T21:44:28.6889031Z Generating XML reports... 2023-01-11T21:44:28.6889449Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213834.xml 2023-01-11T21:44:28.6889814Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6889989Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6890360Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6890549Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6890568Z 2023-01-11T21:44:28.6890679Z Running tests... 2023-01-11T21:44:28.6890934Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6891236Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6891531Z test_new_subgroups_by_enumeration (__main__.TestDistBackendWithSpawn) ... skip: Test requires world size of 4 (0.002s) 2023-01-11T21:44:28.6891571Z 2023-01-11T21:44:28.6891818Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6891929Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6891948Z 2023-01-11T21:44:28.6892055Z OK (skipped=1) 2023-01-11T21:44:28.6892074Z 2023-01-11T21:44:28.6892197Z Generating XML reports... 2023-01-11T21:44:28.6892640Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213836.xml 2023-01-11T21:44:28.6893006Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6893184Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6893560Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6893733Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6893770Z 2023-01-11T21:44:28.6893859Z Running tests... 2023-01-11T21:44:28.6894119Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6894425Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6894730Z test_new_subgroups_by_enumeration_input_rank_exceeds_world_size (__main__.TestDistBackendWithSpawn) ... skip: Test requires world size of 4 (0.002s) 2023-01-11T21:44:28.6894751Z 2023-01-11T21:44:28.6895009Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6895119Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6895141Z 2023-01-11T21:44:28.6895247Z OK (skipped=1) 2023-01-11T21:44:28.6895266Z 2023-01-11T21:44:28.6895388Z Generating XML reports... 2023-01-11T21:44:28.6895808Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213839.xml 2023-01-11T21:44:28.6896173Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6896346Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6896944Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6897140Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6897159Z 2023-01-11T21:44:28.6897271Z Running tests... 2023-01-11T21:44:28.6897532Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6897839Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6898205Z test_new_subgroups_by_enumeration_negative_input_rank (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6898428Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25663 2023-01-11T21:44:28.6898645Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25664 2023-01-11T21:44:28.6899013Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6899188Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6899561Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6899752Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6900108Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6900281Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6900687Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6900886Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6901130Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6901374Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6901774Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6902165Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6902395Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6902627Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6902730Z ok (4.237s) 2023-01-11T21:44:28.6902750Z 2023-01-11T21:44:28.6902997Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6903109Z Ran 1 test in 4.237s 2023-01-11T21:44:28.6903128Z 2023-01-11T21:44:28.6903221Z OK 2023-01-11T21:44:28.6903240Z 2023-01-11T21:44:28.6903363Z Generating XML reports... 2023-01-11T21:44:28.6903802Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213841.xml 2023-01-11T21:44:28.6904168Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6904345Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6904716Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6904889Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6904926Z 2023-01-11T21:44:28.6905016Z Running tests... 2023-01-11T21:44:28.6905280Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6905588Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6905878Z test_new_subgroups_group_size_exceeds_world_size (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6906096Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25766 2023-01-11T21:44:28.6906311Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25767 2023-01-11T21:44:28.6906680Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6906853Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6907270Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6907464Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6907825Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6908000Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6908367Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6908554Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6908799Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6909042Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6909421Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6909858Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6910093Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6910320Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6910422Z ok (4.343s) 2023-01-11T21:44:28.6910442Z 2023-01-11T21:44:28.6910706Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6910817Z Ran 1 test in 4.343s 2023-01-11T21:44:28.6910837Z 2023-01-11T21:44:28.6910930Z OK 2023-01-11T21:44:28.6910948Z 2023-01-11T21:44:28.6911073Z Generating XML reports... 2023-01-11T21:44:28.6911493Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213848.xml 2023-01-11T21:44:28.6911863Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6912042Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6912417Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6912605Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6912625Z 2023-01-11T21:44:28.6912733Z Running tests... 2023-01-11T21:44:28.6912992Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6913300Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6913574Z test_new_subgroups_overlap_not_allowed (__main__.TestDistBackendWithSpawn) ... skip: Test requires world size of 4 (0.002s) 2023-01-11T21:44:28.6913598Z 2023-01-11T21:44:28.6913835Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6913946Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6913965Z 2023-01-11T21:44:28.6914073Z OK (skipped=1) 2023-01-11T21:44:28.6914095Z 2023-01-11T21:44:28.6914218Z Generating XML reports... 2023-01-11T21:44:28.6914660Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213855.xml 2023-01-11T21:44:28.6915022Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6915194Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6915565Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6915758Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6915778Z 2023-01-11T21:44:28.6915868Z Running tests... 2023-01-11T21:44:28.6916197Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6916502Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6916799Z test_new_subgroups_world_size_not_divisible_by_group_size (__main__.TestDistBackendWithSpawn) ... skip: Test requires world size of 4 (0.002s) 2023-01-11T21:44:28.6916820Z 2023-01-11T21:44:28.6917073Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6917175Z Ran 1 test in 0.002s 2023-01-11T21:44:28.6917195Z 2023-01-11T21:44:28.6917291Z OK (skipped=1) 2023-01-11T21:44:28.6917309Z 2023-01-11T21:44:28.6917423Z Generating XML reports... 2023-01-11T21:44:28.6917840Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213857.xml 2023-01-11T21:44:28.6918202Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6918371Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6918782Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6918971Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6918991Z 2023-01-11T21:44:28.6919092Z Running tests... 2023-01-11T21:44:28.6919353Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6919662Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6919937Z test_output_unused_in_loss_dict_module (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6920668Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78112 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.626s) 2023-01-11T21:44:28.6920712Z 2023-01-11T21:44:28.6920956Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6921067Z Ran 1 test in 1.626s 2023-01-11T21:44:28.6921087Z 2023-01-11T21:44:28.6921193Z OK (skipped=1) 2023-01-11T21:44:28.6921212Z 2023-01-11T21:44:28.6921335Z Generating XML reports... 2023-01-11T21:44:28.6921774Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213900.xml 2023-01-11T21:44:28.6922137Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6922312Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6922687Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6922878Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6922898Z 2023-01-11T21:44:28.6922988Z Running tests... 2023-01-11T21:44:28.6923249Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6923676Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6923957Z test_output_unused_in_loss_tuple_module (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6924176Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25969 2023-01-11T21:44:28.6924392Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25970 2023-01-11T21:44:28.6924761Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6924935Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6925291Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6925548Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6926014Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6926283Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6926663Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6926854Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6927099Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6927343Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6927742Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6928123Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6928407Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6928639Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6928913Z [1673473149.468112] [7c5487d9c02b:25969:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6929146Z [1673473149.481614] [7c5487d9c02b:25969:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6929384Z [1673473149.481614] [7c5487d9c02b:25969:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6929652Z [1673473149.471733] [7c5487d9c02b:25970:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6929885Z [1673473149.484950] [7c5487d9c02b:25970:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6930118Z [1673473149.484950] [7c5487d9c02b:25970:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6930220Z ok (6.659s) 2023-01-11T21:44:28.6930241Z 2023-01-11T21:44:28.6930489Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6930601Z Ran 1 test in 6.659s 2023-01-11T21:44:28.6930621Z 2023-01-11T21:44:28.6930712Z OK 2023-01-11T21:44:28.6930731Z 2023-01-11T21:44:28.6930855Z Generating XML reports... 2023-01-11T21:44:28.6931297Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213904.xml 2023-01-11T21:44:28.6931664Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6931843Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6932220Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6932391Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6932431Z 2023-01-11T21:44:28.6932520Z Running tests... 2023-01-11T21:44:28.6932777Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6933085Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6933353Z test_periodic_model_averager (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6933571Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 26087 2023-01-11T21:44:28.6933841Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 26088 2023-01-11T21:44:28.6934211Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6934389Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6934803Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6934993Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6935355Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6935526Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6935896Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6936084Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6936332Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6936813Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6937221Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6937613Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6937841Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6938068Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6938339Z [1673473159.714639] [7c5487d9c02b:26087:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6938614Z [1673473159.715652] [7c5487d9c02b:26088:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6938846Z [1673473159.728742] [7c5487d9c02b:26087:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6939082Z [1673473159.728742] [7c5487d9c02b:26087:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6939301Z [1673473159.729429] [7c5487d9c02b:26088:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6939526Z [1673473159.729429] [7c5487d9c02b:26088:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6939610Z ok (7.202s) 2023-01-11T21:44:28.6939630Z 2023-01-11T21:44:28.6939896Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6940013Z Ran 1 test in 7.202s 2023-01-11T21:44:28.6940032Z 2023-01-11T21:44:28.6940124Z OK 2023-01-11T21:44:28.6940143Z 2023-01-11T21:44:28.6940266Z Generating XML reports... 2023-01-11T21:44:28.6940714Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213913.xml 2023-01-11T21:44:28.6941083Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6941257Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6941632Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6941803Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6941822Z 2023-01-11T21:44:28.6941933Z Running tests... 2023-01-11T21:44:28.6942192Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6942581Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6942867Z test_periodic_model_averager_param_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6943092Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 26202 2023-01-11T21:44:28.6943309Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 26203 2023-01-11T21:44:28.6943675Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6943833Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6944206Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6944395Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6944753Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6944930Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6945352Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6945544Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6945791Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6946032Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6946412Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6946804Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6947031Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6947259Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6947532Z [1673473169.504006] [7c5487d9c02b:26202:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6947801Z [1673473169.506819] [7c5487d9c02b:26203:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6948030Z [1673473169.517945] [7c5487d9c02b:26202:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6948263Z [1673473169.517945] [7c5487d9c02b:26202:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6948489Z [1673473169.520424] [7c5487d9c02b:26203:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6948725Z [1673473169.520424] [7c5487d9c02b:26203:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6948810Z ok (7.312s) 2023-01-11T21:44:28.6948832Z 2023-01-11T21:44:28.6949099Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6949214Z Ran 1 test in 7.312s 2023-01-11T21:44:28.6949234Z 2023-01-11T21:44:28.6949325Z OK 2023-01-11T21:44:28.6949344Z 2023-01-11T21:44:28.6949467Z Generating XML reports... 2023-01-11T21:44:28.6949908Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213923.xml 2023-01-11T21:44:28.6950274Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6950448Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6950802Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6951048Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6951068Z 2023-01-11T21:44:28.6951180Z Running tests... 2023-01-11T21:44:28.6951444Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6951750Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6952026Z test_post_localSGD_optimizer_parity (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6952767Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77123 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.591s) 2023-01-11T21:44:28.6952788Z 2023-01-11T21:44:28.6953047Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6953162Z Ran 1 test in 1.591s 2023-01-11T21:44:28.6953182Z 2023-01-11T21:44:28.6953289Z OK (skipped=1) 2023-01-11T21:44:28.6953308Z 2023-01-11T21:44:28.6953459Z Generating XML reports... 2023-01-11T21:44:28.6953958Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213932.xml 2023-01-11T21:44:28.6954328Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6954503Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6954879Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6955069Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6955089Z 2023-01-11T21:44:28.6955196Z Running tests... 2023-01-11T21:44:28.6955457Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6955748Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6956040Z test_post_localSGD_optimizer_parity_grad_is_view (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6956773Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77292 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.650s) 2023-01-11T21:44:28.6956794Z 2023-01-11T21:44:28.6957050Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6957159Z Ran 1 test in 1.650s 2023-01-11T21:44:28.6957179Z 2023-01-11T21:44:28.6957283Z OK (skipped=1) 2023-01-11T21:44:28.6957302Z 2023-01-11T21:44:28.6957423Z Generating XML reports... 2023-01-11T21:44:28.6957864Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213937.xml 2023-01-11T21:44:28.6958231Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6958405Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6958757Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6958946Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6958965Z 2023-01-11T21:44:28.6959075Z Running tests... 2023-01-11T21:44:28.6959336Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6959642Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6959948Z test_post_localSGD_optimizer_parity_with_hierarchical_sgd (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6960228Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 26385 2023-01-11T21:44:28.6960450Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 26386 2023-01-11T21:44:28.6960816Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6960972Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6961346Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6961536Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6961899Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6962074Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6962446Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6962633Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6962928Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6963157Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6963554Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6963948Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6964176Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6964402Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6964554Z skip: Need at least 4 CUDA devices (4.206s) 2023-01-11T21:44:28.6964574Z 2023-01-11T21:44:28.6964839Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6964953Z Ran 1 test in 4.206s 2023-01-11T21:44:28.6964972Z 2023-01-11T21:44:28.6965079Z OK (skipped=1) 2023-01-11T21:44:28.6965098Z 2023-01-11T21:44:28.6965202Z Generating XML reports... 2023-01-11T21:44:28.6965642Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213941.xml 2023-01-11T21:44:28.6966009Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6966183Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6966557Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6966747Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6966770Z 2023-01-11T21:44:28.6966876Z Running tests... 2023-01-11T21:44:28.6967137Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6967444Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6967747Z test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6967965Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 26488 2023-01-11T21:44:28.6968178Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 26489 2023-01-11T21:44:28.6968547Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6968719Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6969093Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6969335Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6969700Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6969853Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6970226Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6970412Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6970657Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6970898Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6971292Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6971685Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6971960Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6972192Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6972323Z skip: Need at least 4 CUDA devices (4.263s) 2023-01-11T21:44:28.6972360Z 2023-01-11T21:44:28.6972605Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6972716Z Ran 1 test in 4.264s 2023-01-11T21:44:28.6972735Z 2023-01-11T21:44:28.6972842Z OK (skipped=1) 2023-01-11T21:44:28.6972861Z 2023-01-11T21:44:28.6972983Z Generating XML reports... 2023-01-11T21:44:28.6973424Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213948.xml 2023-01-11T21:44:28.6973799Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6973975Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6974349Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6974518Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6974538Z 2023-01-11T21:44:28.6974645Z Running tests... 2023-01-11T21:44:28.6974904Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6975212Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6975494Z test_post_localSGD_optimizer_step_reload (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6976238Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/84886 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.653s) 2023-01-11T21:44:28.6976261Z 2023-01-11T21:44:28.6976519Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6976901Z Ran 1 test in 1.653s 2023-01-11T21:44:28.6976922Z 2023-01-11T21:44:28.6977031Z OK (skipped=1) 2023-01-11T21:44:28.6977050Z 2023-01-11T21:44:28.6977155Z Generating XML reports... 2023-01-11T21:44:28.6977607Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213954.xml 2023-01-11T21:44:28.6977969Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6978142Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6978623Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6978813Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6978837Z 2023-01-11T21:44:28.6978946Z Running tests... 2023-01-11T21:44:28.6979208Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6979515Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6979758Z test_reduce_full_group_max (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6979975Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 26625 2023-01-11T21:44:28.6980191Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 26626 2023-01-11T21:44:28.6980554Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6980732Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6981101Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6981350Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6981719Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6981873Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6982239Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6982428Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6982670Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6982911Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6983312Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6983707Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6983937Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6984177Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:44:28.6984380Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6984613Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:44:28.6985007Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.6985394Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.6985728Z STAGE:2023-01-11 21:40:03 26625:26625 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6986052Z STAGE:2023-01-11 21:40:03 26626:26626 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6986326Z [1673473203.080502] [7c5487d9c02b:26626:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6986557Z [1673473204.712709] [7c5487d9c02b:26626:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6986794Z [1673473204.712709] [7c5487d9c02b:26626:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6987062Z [1673473203.079784] [7c5487d9c02b:26625:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.6987329Z [1673473204.746132] [7c5487d9c02b:26625:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.6987569Z [1673473204.746132] [7c5487d9c02b:26625:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.6988123Z STAGE:2023-01-11 21:40:05 26626:26626 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:40:05 26625:26625 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6988144Z 2023-01-11T21:44:28.6988490Z STAGE:2023-01-11 21:40:05 26626:26626 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6988834Z STAGE:2023-01-11 21:40:05 26625:26625 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6989161Z STAGE:2023-01-11 21:40:05 26626:26626 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6989485Z STAGE:2023-01-11 21:40:05 26625:26625 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.6989860Z STAGE:2023-01-11 21:40:05 26626:26626 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6990193Z STAGE:2023-01-11 21:40:05 26625:26625 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.6990534Z STAGE:2023-01-11 21:40:05 26626:26626 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6990853Z STAGE:2023-01-11 21:40:05 26625:26625 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.6990956Z ok (6.657s) 2023-01-11T21:44:28.6990976Z 2023-01-11T21:44:28.6991241Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6991353Z Ran 1 test in 6.657s 2023-01-11T21:44:28.6991372Z 2023-01-11T21:44:28.6991464Z OK 2023-01-11T21:44:28.6991483Z 2023-01-11T21:44:28.6991611Z Generating XML reports... 2023-01-11T21:44:28.6992053Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213959.xml 2023-01-11T21:44:28.6992423Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6992597Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6992954Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6993145Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6993164Z 2023-01-11T21:44:28.6993277Z Running tests... 2023-01-11T21:44:28.6993539Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.6993848Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.6994106Z test_reduce_full_group_min (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.6994330Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 26739 2023-01-11T21:44:28.6994548Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 26740 2023-01-11T21:44:28.6994896Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6995070Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6995446Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6995637Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6995997Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.6996170Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.6996596Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.6996788Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.6997031Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.6997255Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.6997650Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6998040Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.6998266Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.6998503Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:44:28.6998727Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.6999005Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:44:28.6999401Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.6999788Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.7000100Z STAGE:2023-01-11 21:40:12 26740:26740 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7000421Z STAGE:2023-01-11 21:40:12 26739:26739 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7000693Z [1673473212.238637] [7c5487d9c02b:26740:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7000930Z [1673473213.865325] [7c5487d9c02b:26740:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7001169Z [1673473213.865325] [7c5487d9c02b:26740:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7001502Z STAGE:2023-01-11 21:40:14 26740:26740 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7001845Z STAGE:2023-01-11 21:40:14 26740:26740 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7002116Z [1673473212.218606] [7c5487d9c02b:26739:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7002342Z [1673473213.864451] [7c5487d9c02b:26739:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7002575Z [1673473213.864451] [7c5487d9c02b:26739:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7002890Z STAGE:2023-01-11 21:40:14 26739:26739 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7003235Z STAGE:2023-01-11 21:40:14 26739:26739 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7003559Z STAGE:2023-01-11 21:40:14 26739:26739 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7003886Z STAGE:2023-01-11 21:40:14 26739:26739 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7004227Z STAGE:2023-01-11 21:40:14 26739:26739 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7004551Z STAGE:2023-01-11 21:40:14 26740:26740 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7004880Z STAGE:2023-01-11 21:40:14 26740:26740 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7005278Z STAGE:2023-01-11 21:40:14 26740:26740 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7005381Z ok (6.637s) 2023-01-11T21:44:28.7005402Z 2023-01-11T21:44:28.7005652Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7005768Z Ran 1 test in 6.638s 2023-01-11T21:44:28.7005788Z 2023-01-11T21:44:28.7005879Z OK 2023-01-11T21:44:28.7005899Z 2023-01-11T21:44:28.7006024Z Generating XML reports... 2023-01-11T21:44:28.7006465Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214008.xml 2023-01-11T21:44:28.7006831Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7007006Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7007380Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7007555Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7007592Z 2023-01-11T21:44:28.7007682Z Running tests... 2023-01-11T21:44:28.7008026Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7008342Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7008613Z test_reduce_full_group_product (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.7008829Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 26853 2023-01-11T21:44:28.7009043Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 26854 2023-01-11T21:44:28.7009410Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7009584Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7009945Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7010134Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7010499Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7010672Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7011038Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7011225Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7011468Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.7011708Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.7012085Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7012483Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7012713Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.7012950Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:44:28.7013173Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.7013410Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:44:28.7013801Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.7014188Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.7014577Z STAGE:2023-01-11 21:40:21 26854:26854 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7014892Z STAGE:2023-01-11 21:40:21 26853:26853 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7015167Z [1673473221.375316] [7c5487d9c02b:26854:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7015398Z [1673473223.029763] [7c5487d9c02b:26854:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7015636Z [1673473223.029763] [7c5487d9c02b:26854:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7015902Z [1673473221.354622] [7c5487d9c02b:26853:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7016132Z [1673473223.045446] [7c5487d9c02b:26853:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7016409Z [1673473223.045446] [7c5487d9c02b:26853:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7017149Z STAGE:2023-01-11 21:40:23 26854:26854 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:40:23 26853:26853 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7017172Z 2023-01-11T21:44:28.7017523Z STAGE:2023-01-11 21:40:23 26854:26854 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7017865Z STAGE:2023-01-11 21:40:23 26853:26853 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7018189Z STAGE:2023-01-11 21:40:23 26854:26854 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7018487Z STAGE:2023-01-11 21:40:23 26853:26853 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7018819Z STAGE:2023-01-11 21:40:23 26854:26854 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7019155Z STAGE:2023-01-11 21:40:23 26853:26853 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7019497Z STAGE:2023-01-11 21:40:23 26854:26854 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7019838Z STAGE:2023-01-11 21:40:23 26853:26853 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7019939Z ok (6.605s) 2023-01-11T21:44:28.7019959Z 2023-01-11T21:44:28.7020222Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7020336Z Ran 1 test in 6.605s 2023-01-11T21:44:28.7020356Z 2023-01-11T21:44:28.7020447Z OK 2023-01-11T21:44:28.7020466Z 2023-01-11T21:44:28.7020573Z Generating XML reports... 2023-01-11T21:44:28.7021015Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214017.xml 2023-01-11T21:44:28.7021389Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7021563Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7021939Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7022127Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7022147Z 2023-01-11T21:44:28.7022254Z Running tests... 2023-01-11T21:44:28.7022513Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7022802Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7023063Z test_reduce_full_group_sum (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.7023368Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 26967 2023-01-11T21:44:28.7023582Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 26968 2023-01-11T21:44:28.7023955Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7024130Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7024506Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7024694Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7025055Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7025209Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7025577Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7025764Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7026067Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.7026313Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.7026710Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7027097Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7027326Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.7027550Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.7027768Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:44:28.7028013Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:44:28.7028405Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.7028793Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.7029124Z STAGE:2023-01-11 21:40:30 26967:26967 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7029449Z STAGE:2023-01-11 21:40:30 26968:26968 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7029722Z [1673473230.368094] [7c5487d9c02b:26968:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7029953Z [1673473231.985931] [7c5487d9c02b:26968:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7030194Z [1673473231.985931] [7c5487d9c02b:26968:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7030445Z [1673473230.368122] [7c5487d9c02b:26967:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7030670Z [1673473232.015865] [7c5487d9c02b:26967:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7030902Z [1673473232.015865] [7c5487d9c02b:26967:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7031447Z STAGE:2023-01-11 21:40:32 26968:26968 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:40:32 26967:26967 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7031468Z 2023-01-11T21:44:28.7031871Z STAGE:2023-01-11 21:40:32 26968:26968 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7032215Z STAGE:2023-01-11 21:40:32 26967:26967 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7032539Z STAGE:2023-01-11 21:40:32 26968:26968 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7032861Z STAGE:2023-01-11 21:40:32 26967:26967 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7033194Z STAGE:2023-01-11 21:40:32 26968:26968 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7033519Z STAGE:2023-01-11 21:40:32 26967:26967 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7033840Z STAGE:2023-01-11 21:40:32 26968:26968 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7034179Z STAGE:2023-01-11 21:40:32 26967:26967 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7034282Z ok (6.410s) 2023-01-11T21:44:28.7034303Z 2023-01-11T21:44:28.7034565Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7034719Z Ran 1 test in 6.410s 2023-01-11T21:44:28.7034741Z 2023-01-11T21:44:28.7034834Z OK 2023-01-11T21:44:28.7034854Z 2023-01-11T21:44:28.7034978Z Generating XML reports... 2023-01-11T21:44:28.7035423Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214026.xml 2023-01-11T21:44:28.7035791Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7035947Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7036322Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7036512Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7036537Z 2023-01-11T21:44:28.7036645Z Running tests... 2023-01-11T21:44:28.7036904Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7037212Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7037466Z test_reduce_group_max (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.7037684Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27081 2023-01-11T21:44:28.7037881Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27082 2023-01-11T21:44:28.7038249Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7038423Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7038797Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7038988Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7039348Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7039520Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7039885Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7040073Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7040299Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.7040540Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.7040936Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7041398Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7041629Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.7041855Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.7042013Z skip: Skipped due to small world size. (4.189s) 2023-01-11T21:44:28.7042033Z 2023-01-11T21:44:28.7042293Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7042404Z Ran 1 test in 4.190s 2023-01-11T21:44:28.7042423Z 2023-01-11T21:44:28.7042512Z OK (skipped=1) 2023-01-11T21:44:28.7042531Z 2023-01-11T21:44:28.7042651Z Generating XML reports... 2023-01-11T21:44:28.7043086Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214035.xml 2023-01-11T21:44:28.7043452Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7043626Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7044043Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7044239Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7044259Z 2023-01-11T21:44:28.7044368Z Running tests... 2023-01-11T21:44:28.7044614Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7044919Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7045168Z test_reduce_group_min (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.7045381Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27184 2023-01-11T21:44:28.7045593Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27185 2023-01-11T21:44:28.7045961Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7046137Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7046505Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7046691Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7047032Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7047204Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7047572Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7047756Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7048000Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.7048241Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.7048637Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7049029Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7049254Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.7049461Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.7049615Z skip: Skipped due to small world size. (4.187s) 2023-01-11T21:44:28.7049636Z 2023-01-11T21:44:28.7049897Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7050060Z Ran 1 test in 4.187s 2023-01-11T21:44:28.7050080Z 2023-01-11T21:44:28.7050186Z OK (skipped=1) 2023-01-11T21:44:28.7050205Z 2023-01-11T21:44:28.7050325Z Generating XML reports... 2023-01-11T21:44:28.7050769Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214042.xml 2023-01-11T21:44:28.7051135Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7051291Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7051663Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7051850Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7051871Z 2023-01-11T21:44:28.7051976Z Running tests... 2023-01-11T21:44:28.7052233Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7052539Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7052843Z test_reduce_group_product (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.7053065Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27287 2023-01-11T21:44:28.7053281Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27288 2023-01-11T21:44:28.7053633Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7053851Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7054233Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7054420Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7054781Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7054957Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7055326Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7055509Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7055733Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.7055971Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.7056365Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7056914Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7057145Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.7057372Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.7057527Z skip: Skipped due to small world size. (4.143s) 2023-01-11T21:44:28.7057548Z 2023-01-11T21:44:28.7057817Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7057926Z Ran 1 test in 4.143s 2023-01-11T21:44:28.7057945Z 2023-01-11T21:44:28.7058034Z OK (skipped=1) 2023-01-11T21:44:28.7058052Z 2023-01-11T21:44:28.7058174Z Generating XML reports... 2023-01-11T21:44:28.7058617Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214048.xml 2023-01-11T21:44:28.7058975Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7059147Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7059608Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7059795Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7059819Z 2023-01-11T21:44:28.7059925Z Running tests... 2023-01-11T21:44:28.7060184Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7060474Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7060726Z test_reduce_group_sum (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.7060942Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27390 2023-01-11T21:44:28.7061157Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27391 2023-01-11T21:44:28.7061521Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7061698Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7062133Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7062329Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7062674Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7062844Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7063217Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7063401Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7063642Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.7063880Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.7064280Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7064671Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7064898Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.7065106Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.7065263Z skip: Skipped due to small world size. (4.259s) 2023-01-11T21:44:28.7065283Z 2023-01-11T21:44:28.7065543Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7065654Z Ran 1 test in 4.259s 2023-01-11T21:44:28.7065673Z 2023-01-11T21:44:28.7065778Z OK (skipped=1) 2023-01-11T21:44:28.7065797Z 2023-01-11T21:44:28.7065917Z Generating XML reports... 2023-01-11T21:44:28.7066359Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214055.xml 2023-01-11T21:44:28.7066725Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7066897Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7067252Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7067441Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7067461Z 2023-01-11T21:44:28.7067566Z Running tests... 2023-01-11T21:44:28.7067824Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7068126Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7068368Z test_reduce_max (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.7068637Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27493 2023-01-11T21:44:28.7068856Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27494 2023-01-11T21:44:28.7069207Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7069378Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7069746Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7069933Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7070291Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7070463Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7070831Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7071015Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7071279Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.7071524Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.7071918Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7072308Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7072534Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.7072868Z STAGE:2023-01-11 21:41:06 27494:27494 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7073093Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.7073427Z STAGE:2023-01-11 21:41:06 27493:27493 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7073700Z [1673473266.390565] [7c5487d9c02b:27494:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7073931Z [1673473268.014020] [7c5487d9c02b:27494:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7074151Z [1673473268.014020] [7c5487d9c02b:27494:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7074417Z [1673473266.367078] [7c5487d9c02b:27493:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7074643Z [1673473268.011481] [7c5487d9c02b:27493:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7074883Z [1673473268.011481] [7c5487d9c02b:27493:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7075433Z STAGE:2023-01-11 21:41:08 27494:27494 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:41:08 27493:27493 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7075454Z 2023-01-11T21:44:28.7075797Z STAGE:2023-01-11 21:41:08 27494:27494 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7076135Z STAGE:2023-01-11 21:41:08 27493:27493 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7076459Z STAGE:2023-01-11 21:41:08 27494:27494 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7076775Z STAGE:2023-01-11 21:41:08 27493:27493 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7077156Z STAGE:2023-01-11 21:41:08 27494:27494 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7077692Z STAGE:2023-01-11 21:41:08 27494:27494 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 21:41:08 27493:27493 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7077729Z 2023-01-11T21:44:28.7078050Z STAGE:2023-01-11 21:41:08 27493:27493 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7078151Z ok (6.527s) 2023-01-11T21:44:28.7078170Z 2023-01-11T21:44:28.7078427Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7078538Z Ran 1 test in 6.527s 2023-01-11T21:44:28.7078557Z 2023-01-11T21:44:28.7078649Z OK 2023-01-11T21:44:28.7078668Z 2023-01-11T21:44:28.7078790Z Generating XML reports... 2023-01-11T21:44:28.7079232Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214102.xml 2023-01-11T21:44:28.7079600Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7079819Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7080188Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7080374Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7080396Z 2023-01-11T21:44:28.7080504Z Running tests... 2023-01-11T21:44:28.7080764Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7081072Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7081311Z test_reduce_min (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.7081526Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27607 2023-01-11T21:44:28.7081741Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27608 2023-01-11T21:44:28.7082090Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7082264Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7082638Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7082822Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7083180Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7083349Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7083715Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7083901Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7084125Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.7084372Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.7084765Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7085154Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7085379Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.7085707Z STAGE:2023-01-11 21:41:15 27607:27607 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7085930Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.7086304Z STAGE:2023-01-11 21:41:15 27608:27608 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7086581Z [1673473275.426211] [7c5487d9c02b:27608:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7086808Z [1673473277.060896] [7c5487d9c02b:27608:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7087026Z [1673473277.060896] [7c5487d9c02b:27608:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7087289Z [1673473275.424663] [7c5487d9c02b:27607:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7087511Z [1673473277.067209] [7c5487d9c02b:27607:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7087743Z [1673473277.067209] [7c5487d9c02b:27607:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7088330Z STAGE:2023-01-11 21:41:17 27608:27608 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:41:17 27607:27607 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7088353Z 2023-01-11T21:44:28.7088706Z STAGE:2023-01-11 21:41:17 27607:27607 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7089047Z STAGE:2023-01-11 21:41:17 27608:27608 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7089370Z STAGE:2023-01-11 21:41:17 27608:27608 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7089686Z STAGE:2023-01-11 21:41:17 27607:27607 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7090013Z STAGE:2023-01-11 21:41:17 27608:27608 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7090325Z STAGE:2023-01-11 21:41:17 27607:27607 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7090665Z STAGE:2023-01-11 21:41:17 27608:27608 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7090998Z STAGE:2023-01-11 21:41:17 27607:27607 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7091095Z ok (6.505s) 2023-01-11T21:44:28.7091114Z 2023-01-11T21:44:28.7091374Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7091485Z Ran 1 test in 6.505s 2023-01-11T21:44:28.7091505Z 2023-01-11T21:44:28.7091596Z OK 2023-01-11T21:44:28.7091615Z 2023-01-11T21:44:28.7091737Z Generating XML reports... 2023-01-11T21:44:28.7092179Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214111.xml 2023-01-11T21:44:28.7092527Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7092705Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7093085Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7093274Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7093293Z 2023-01-11T21:44:28.7093400Z Running tests... 2023-01-11T21:44:28.7093656Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7093963Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7094239Z test_reduce_multigpu (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl backend supports reduce multigpu (0.002s) 2023-01-11T21:44:28.7094258Z 2023-01-11T21:44:28.7094514Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7094665Z Ran 1 test in 0.002s 2023-01-11T21:44:28.7094685Z 2023-01-11T21:44:28.7094792Z OK (skipped=1) 2023-01-11T21:44:28.7094812Z 2023-01-11T21:44:28.7094934Z Generating XML reports... 2023-01-11T21:44:28.7095381Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214120.xml 2023-01-11T21:44:28.7095744Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7095913Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7096286Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7096470Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7096489Z 2023-01-11T21:44:28.7096744Z Running tests... 2023-01-11T21:44:28.7097014Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7097419Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7097676Z test_reduce_product (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.7097968Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27754 2023-01-11T21:44:28.7098189Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27755 2023-01-11T21:44:28.7098559Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7098733Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7099106Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7099277Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7099633Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7099808Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7100175Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7100358Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7100600Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.7100838Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.7101237Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7101608Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7102007Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.7102354Z STAGE:2023-01-11 21:41:26 27755:27755 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7102580Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.7102909Z STAGE:2023-01-11 21:41:26 27754:27754 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7103176Z [1673473286.976725] [7c5487d9c02b:27755:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7103404Z [1673473288.608413] [7c5487d9c02b:27755:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7103643Z [1673473288.608413] [7c5487d9c02b:27755:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7103913Z [1673473286.955995] [7c5487d9c02b:27754:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7104236Z [1673473288.611736] [7c5487d9c02b:27754:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7104452Z [1673473288.611736] [7c5487d9c02b:27754:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7105003Z STAGE:2023-01-11 21:41:28 27755:27755 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:41:28 27754:27754 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7105025Z 2023-01-11T21:44:28.7105368Z STAGE:2023-01-11 21:41:28 27755:27755 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7105711Z STAGE:2023-01-11 21:41:28 27754:27754 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7106036Z STAGE:2023-01-11 21:41:29 27755:27755 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7106353Z STAGE:2023-01-11 21:41:29 27754:27754 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7106725Z STAGE:2023-01-11 21:41:29 27755:27755 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7107061Z STAGE:2023-01-11 21:41:29 27754:27754 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7107400Z STAGE:2023-01-11 21:41:29 27755:27755 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7107740Z STAGE:2023-01-11 21:41:29 27754:27754 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7107824Z ok (6.644s) 2023-01-11T21:44:28.7107844Z 2023-01-11T21:44:28.7108105Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7108217Z Ran 1 test in 6.644s 2023-01-11T21:44:28.7108236Z 2023-01-11T21:44:28.7108327Z OK 2023-01-11T21:44:28.7108349Z 2023-01-11T21:44:28.7108472Z Generating XML reports... 2023-01-11T21:44:28.7108916Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214123.xml 2023-01-11T21:44:28.7109289Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7109464Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7109820Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7110009Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7110029Z 2023-01-11T21:44:28.7110135Z Running tests... 2023-01-11T21:44:28.7110392Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7110696Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7110987Z test_reduce_scatter_tensor_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA reduce_scatter_tensor (0.002s) 2023-01-11T21:44:28.7111007Z 2023-01-11T21:44:28.7111265Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7111371Z Ran 1 test in 0.002s 2023-01-11T21:44:28.7111390Z 2023-01-11T21:44:28.7111491Z OK (skipped=1) 2023-01-11T21:44:28.7111510Z 2023-01-11T21:44:28.7111629Z Generating XML reports... 2023-01-11T21:44:28.7112047Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214132.xml 2023-01-11T21:44:28.7112418Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7112591Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7112960Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7113202Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7113222Z 2023-01-11T21:44:28.7113331Z Running tests... 2023-01-11T21:44:28.7113597Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7113903Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7114153Z test_reduce_scatter_v_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports reduce_scatter_v (0.003s) 2023-01-11T21:44:28.7114191Z 2023-01-11T21:44:28.7114432Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7114542Z Ran 1 test in 0.003s 2023-01-11T21:44:28.7114561Z 2023-01-11T21:44:28.7114667Z OK (skipped=1) 2023-01-11T21:44:28.7114686Z 2023-01-11T21:44:28.7114805Z Generating XML reports... 2023-01-11T21:44:28.7115248Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214134.xml 2023-01-11T21:44:28.7115615Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7115834Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7116213Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7116384Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7116416Z 2023-01-11T21:44:28.7116505Z Running tests... 2023-01-11T21:44:28.7116758Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7117058Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7117298Z test_reduce_sum (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.7117514Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27934 2023-01-11T21:44:28.7117729Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27935 2023-01-11T21:44:28.7118093Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7118266Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7118624Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7118813Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7119171Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7119341Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7119707Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7119897Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7120138Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.7120382Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.7120759Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7121148Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7121374Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.7121705Z STAGE:2023-01-11 21:41:40 27934:27934 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7121928Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.7122313Z STAGE:2023-01-11 21:41:40 27935:27935 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7122589Z [1673473301.009989] [7c5487d9c02b:27934:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7122816Z [1673473302.633845] [7c5487d9c02b:27934:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7123049Z [1673473302.633845] [7c5487d9c02b:27934:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7123316Z [1673473301.029981] [7c5487d9c02b:27935:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7123524Z [1673473302.675289] [7c5487d9c02b:27935:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7123756Z [1673473302.675289] [7c5487d9c02b:27935:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7124352Z STAGE:2023-01-11 21:41:43 27934:27934 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:41:43 27935:27935 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7124374Z 2023-01-11T21:44:28.7124724Z STAGE:2023-01-11 21:41:43 27935:27935 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7125065Z STAGE:2023-01-11 21:41:43 27934:27934 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7125390Z STAGE:2023-01-11 21:41:43 27934:27934 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7125718Z STAGE:2023-01-11 21:41:43 27934:27934 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7126057Z STAGE:2023-01-11 21:41:43 27934:27934 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7126381Z STAGE:2023-01-11 21:41:43 27935:27935 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7126711Z STAGE:2023-01-11 21:41:43 27935:27935 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7127032Z STAGE:2023-01-11 21:41:43 27935:27935 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7127131Z ok (6.768s) 2023-01-11T21:44:28.7127151Z 2023-01-11T21:44:28.7127411Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7127523Z Ran 1 test in 6.768s 2023-01-11T21:44:28.7127542Z 2023-01-11T21:44:28.7127632Z OK 2023-01-11T21:44:28.7127652Z 2023-01-11T21:44:28.7127771Z Generating XML reports... 2023-01-11T21:44:28.7128213Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214137.xml 2023-01-11T21:44:28.7128576Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7128736Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7129112Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7129305Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7129325Z 2023-01-11T21:44:28.7129434Z Running tests... 2023-01-11T21:44:28.7129697Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7130002Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7130255Z test_reduce_sum_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA reduce (0.002s) 2023-01-11T21:44:28.7130275Z 2023-01-11T21:44:28.7130528Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7130640Z Ran 1 test in 0.002s 2023-01-11T21:44:28.7130710Z 2023-01-11T21:44:28.7130805Z OK (skipped=1) 2023-01-11T21:44:28.7130838Z 2023-01-11T21:44:28.7130943Z Generating XML reports... 2023-01-11T21:44:28.7131388Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214146.xml 2023-01-11T21:44:28.7131751Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7131923Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7132294Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7132475Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7132495Z 2023-01-11T21:44:28.7132602Z Running tests... 2023-01-11T21:44:28.7132864Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7133152Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7133413Z test_reduce_sum_cuda_twice (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA reduce (0.002s) 2023-01-11T21:44:28.7133433Z 2023-01-11T21:44:28.7133734Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7133848Z Ran 1 test in 0.002s 2023-01-11T21:44:28.7133869Z 2023-01-11T21:44:28.7133975Z OK (skipped=1) 2023-01-11T21:44:28.7133994Z 2023-01-11T21:44:28.7134115Z Generating XML reports... 2023-01-11T21:44:28.7134559Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214148.xml 2023-01-11T21:44:28.7134926Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7135096Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7135451Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7135647Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7135667Z 2023-01-11T21:44:28.7135773Z Running tests... 2023-01-11T21:44:28.7136032Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7136335Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7136765Z test_reduce_sum_twice (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.7136994Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 28114 2023-01-11T21:44:28.7137209Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 28115 2023-01-11T21:44:28.7137569Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7137734Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7138110Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7138301Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7138659Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7138830Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7139202Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7139389Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7139617Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.7139857Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.7140344Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7140739Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7140966Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.7141294Z STAGE:2023-01-11 21:41:54 28115:28115 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7141518Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.7141843Z STAGE:2023-01-11 21:41:55 28114:28114 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7142120Z [1673473315.106747] [7c5487d9c02b:28115:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7142347Z [1673473316.725778] [7c5487d9c02b:28115:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7142630Z [1673473316.725778] [7c5487d9c02b:28115:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7142908Z [1673473315.086201] [7c5487d9c02b:28114:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7143134Z [1673473316.752958] [7c5487d9c02b:28114:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7143369Z [1673473316.752958] [7c5487d9c02b:28114:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7143919Z STAGE:2023-01-11 21:41:57 28115:28115 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:41:57 28114:28114 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7143944Z 2023-01-11T21:44:28.7144293Z STAGE:2023-01-11 21:41:57 28114:28114 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7144636Z STAGE:2023-01-11 21:41:57 28115:28115 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7144961Z STAGE:2023-01-11 21:41:57 28115:28115 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7145275Z STAGE:2023-01-11 21:41:57 28114:28114 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7145601Z STAGE:2023-01-11 21:41:57 28115:28115 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7145911Z STAGE:2023-01-11 21:41:57 28114:28114 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7146253Z STAGE:2023-01-11 21:41:57 28115:28115 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7154384Z STAGE:2023-01-11 21:41:57 28114:28114 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7154534Z ok (6.670s) 2023-01-11T21:44:28.7154557Z 2023-01-11T21:44:28.7154850Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7154965Z Ran 1 test in 6.670s 2023-01-11T21:44:28.7154985Z 2023-01-11T21:44:28.7155073Z OK 2023-01-11T21:44:28.7155092Z 2023-01-11T21:44:28.7155219Z Generating XML reports... 2023-01-11T21:44:28.7155659Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214151.xml 2023-01-11T21:44:28.7156032Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7156208Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7156586Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7156775Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7156937Z 2023-01-11T21:44:28.7157050Z Running tests... 2023-01-11T21:44:28.7157321Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7157636Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7157896Z test_scatter (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T21:44:28.7157916Z 2023-01-11T21:44:28.7158154Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7158266Z Ran 1 test in 0.002s 2023-01-11T21:44:28.7158285Z 2023-01-11T21:44:28.7158392Z OK (skipped=1) 2023-01-11T21:44:28.7158411Z 2023-01-11T21:44:28.7158534Z Generating XML reports... 2023-01-11T21:44:28.7158980Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214200.xml 2023-01-11T21:44:28.7159348Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7159525Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7159950Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7160147Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7160167Z 2023-01-11T21:44:28.7160257Z Running tests... 2023-01-11T21:44:28.7160520Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7160828Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7161092Z test_scatter_checks (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T21:44:28.7161112Z 2023-01-11T21:44:28.7161368Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7161482Z Ran 1 test in 0.002s 2023-01-11T21:44:28.7161501Z 2023-01-11T21:44:28.7161603Z OK (skipped=1) 2023-01-11T21:44:28.7161622Z 2023-01-11T21:44:28.7161747Z Generating XML reports... 2023-01-11T21:44:28.7162169Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214202.xml 2023-01-11T21:44:28.7162536Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7162705Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7163075Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7163263Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7163282Z 2023-01-11T21:44:28.7163386Z Running tests... 2023-01-11T21:44:28.7163639Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7163947Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7164214Z test_scatter_complex (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T21:44:28.7164234Z 2023-01-11T21:44:28.7164477Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7164586Z Ran 1 test in 0.002s 2023-01-11T21:44:28.7164605Z 2023-01-11T21:44:28.7164707Z OK (skipped=1) 2023-01-11T21:44:28.7164725Z 2023-01-11T21:44:28.7164845Z Generating XML reports... 2023-01-11T21:44:28.7165284Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214205.xml 2023-01-11T21:44:28.7165644Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7165813Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7166243Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7166434Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7166457Z 2023-01-11T21:44:28.7166548Z Running tests... 2023-01-11T21:44:28.7166810Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7167116Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7167368Z test_scatter_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA gather (0.002s) 2023-01-11T21:44:28.7167388Z 2023-01-11T21:44:28.7167642Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7167752Z Ran 1 test in 0.002s 2023-01-11T21:44:28.7167771Z 2023-01-11T21:44:28.7167876Z OK (skipped=1) 2023-01-11T21:44:28.7167895Z 2023-01-11T21:44:28.7168015Z Generating XML reports... 2023-01-11T21:44:28.7168453Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214207.xml 2023-01-11T21:44:28.7168846Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7169022Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7169397Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7169585Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7169604Z 2023-01-11T21:44:28.7169709Z Running tests... 2023-01-11T21:44:28.7169968Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7170272Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7170535Z test_scatter_cuda_complex (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA gather (0.002s) 2023-01-11T21:44:28.7170559Z 2023-01-11T21:44:28.7170796Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7170903Z Ran 1 test in 0.002s 2023-01-11T21:44:28.7170922Z 2023-01-11T21:44:28.7171027Z OK (skipped=1) 2023-01-11T21:44:28.7171046Z 2023-01-11T21:44:28.7171163Z Generating XML reports... 2023-01-11T21:44:28.7171597Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214209.xml 2023-01-11T21:44:28.7171958Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7172130Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7172503Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7172690Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7172713Z 2023-01-11T21:44:28.7172803Z Running tests... 2023-01-11T21:44:28.7173059Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7173369Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7173635Z test_scatter_full_group (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T21:44:28.7173654Z 2023-01-11T21:44:28.7173907Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7174016Z Ran 1 test in 0.002s 2023-01-11T21:44:28.7174035Z 2023-01-11T21:44:28.7174143Z OK (skipped=1) 2023-01-11T21:44:28.7174162Z 2023-01-11T21:44:28.7174284Z Generating XML reports... 2023-01-11T21:44:28.7174719Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214212.xml 2023-01-11T21:44:28.7175067Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7175313Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7175696Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7175886Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7175905Z 2023-01-11T21:44:28.7176014Z Running tests... 2023-01-11T21:44:28.7176273Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7176852Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7177134Z test_scatter_group (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T21:44:28.7177156Z 2023-01-11T21:44:28.7177428Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7177517Z Ran 1 test in 0.002s 2023-01-11T21:44:28.7177542Z 2023-01-11T21:44:28.7177650Z OK (skipped=1) 2023-01-11T21:44:28.7177669Z 2023-01-11T21:44:28.7177789Z Generating XML reports... 2023-01-11T21:44:28.7178317Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214214.xml 2023-01-11T21:44:28.7178699Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7178872Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7179246Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7179436Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7179455Z 2023-01-11T21:44:28.7179544Z Running tests... 2023-01-11T21:44:28.7179802Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7180105Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7180487Z test_scatter_object_list (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:44:28.7180509Z 2023-01-11T21:44:28.7180759Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7180864Z Ran 1 test in 0.002s 2023-01-11T21:44:28.7180883Z 2023-01-11T21:44:28.7180988Z OK (skipped=1) 2023-01-11T21:44:28.7181007Z 2023-01-11T21:44:28.7181128Z Generating XML reports... 2023-01-11T21:44:28.7181562Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214217.xml 2023-01-11T21:44:28.7181910Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7182082Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7182453Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7182646Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7182665Z 2023-01-11T21:44:28.7182770Z Running tests... 2023-01-11T21:44:28.7183028Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7183331Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7183573Z test_send_recv (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.7183774Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 28492 2023-01-11T21:44:28.7183987Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 28493 2023-01-11T21:44:28.7184347Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7184520Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7184975Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7185166Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7185528Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7185701Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7186073Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7186242Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7186484Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.7186728Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.7187122Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7187556Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7187781Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.7188001Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.7188273Z [1673473343.546439] [7c5487d9c02b:28492:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7188500Z [1673473344.981978] [7c5487d9c02b:28492:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7188720Z [1673473344.981978] [7c5487d9c02b:28492:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7188993Z [1673473343.567183] [7c5487d9c02b:28493:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7189222Z [1673473344.984798] [7c5487d9c02b:28493:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7189458Z [1673473344.984798] [7c5487d9c02b:28493:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7189560Z ok (6.139s) 2023-01-11T21:44:28.7189580Z 2023-01-11T21:44:28.7189848Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7189960Z Ran 1 test in 6.139s 2023-01-11T21:44:28.7189980Z 2023-01-11T21:44:28.7190071Z OK 2023-01-11T21:44:28.7190090Z 2023-01-11T21:44:28.7190211Z Generating XML reports... 2023-01-11T21:44:28.7190632Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214219.xml 2023-01-11T21:44:28.7191004Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7191183Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7191555Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7191743Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7191763Z 2023-01-11T21:44:28.7191868Z Running tests... 2023-01-11T21:44:28.7192125Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7192428Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7192705Z test_send_recv_any_source (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support send/recv from any source (0.002s) 2023-01-11T21:44:28.7192725Z 2023-01-11T21:44:28.7193027Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7193137Z Ran 1 test in 0.002s 2023-01-11T21:44:28.7193157Z 2023-01-11T21:44:28.7193265Z OK (skipped=1) 2023-01-11T21:44:28.7193284Z 2023-01-11T21:44:28.7193409Z Generating XML reports... 2023-01-11T21:44:28.7193842Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214228.xml 2023-01-11T21:44:28.7194205Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7194377Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7194750Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7194935Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7194955Z 2023-01-11T21:44:28.7195044Z Running tests... 2023-01-11T21:44:28.7195306Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7195610Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7195955Z test_send_recv_any_source_autograd_profiler (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support send/recv from any source (0.002s) 2023-01-11T21:44:28.7195977Z 2023-01-11T21:44:28.7196238Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7196350Z Ran 1 test in 0.002s 2023-01-11T21:44:28.7196369Z 2023-01-11T21:44:28.7196479Z OK (skipped=1) 2023-01-11T21:44:28.7196498Z 2023-01-11T21:44:28.7196622Z Generating XML reports... 2023-01-11T21:44:28.7197056Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214230.xml 2023-01-11T21:44:28.7197403Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7197581Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7197956Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7198143Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7198162Z 2023-01-11T21:44:28.7198269Z Running tests... 2023-01-11T21:44:28.7198526Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7198831Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7199128Z test_send_recv_any_source_torch_profiler (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support send/recv from any source (0.002s) 2023-01-11T21:44:28.7199148Z 2023-01-11T21:44:28.7199403Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7199497Z Ran 1 test in 0.002s 2023-01-11T21:44:28.7199520Z 2023-01-11T21:44:28.7199626Z OK (skipped=1) 2023-01-11T21:44:28.7199645Z 2023-01-11T21:44:28.7199767Z Generating XML reports... 2023-01-11T21:44:28.7200205Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214233.xml 2023-01-11T21:44:28.7200571Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7200743Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7201114Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7201302Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7201322Z 2023-01-11T21:44:28.7201412Z Running tests... 2023-01-11T21:44:28.7201667Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7201971Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7202297Z test_send_recv_autograd_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.7202520Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 28701 2023-01-11T21:44:28.7202737Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 28702 2023-01-11T21:44:28.7203107Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7203281Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7203654Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7203824Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7204183Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7204358Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7204769Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7204957Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7205202Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.7205444Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.7205842Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7206214Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7206436Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.7206777Z STAGE:2023-01-11 21:42:39 28702:28702 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7207004Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.7207332Z STAGE:2023-01-11 21:42:39 28701:28701 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7207606Z [1673473359.463074] [7c5487d9c02b:28702:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7207840Z [1673473361.103661] [7c5487d9c02b:28702:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7208073Z [1673473361.103661] [7c5487d9c02b:28702:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7208405Z STAGE:2023-01-11 21:42:41 28702:28702 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7208676Z [1673473359.460626] [7c5487d9c02b:28701:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7208889Z [1673473361.111328] [7c5487d9c02b:28701:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7209122Z [1673473361.111328] [7c5487d9c02b:28701:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7209456Z STAGE:2023-01-11 21:42:41 28701:28701 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7209799Z STAGE:2023-01-11 21:42:41 28702:28702 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7210135Z STAGE:2023-01-11 21:42:41 28701:28701 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7210234Z ok (6.754s) 2023-01-11T21:44:28.7210254Z 2023-01-11T21:44:28.7210568Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7210678Z Ran 1 test in 6.755s 2023-01-11T21:44:28.7210698Z 2023-01-11T21:44:28.7210786Z OK 2023-01-11T21:44:28.7210805Z 2023-01-11T21:44:28.7210915Z Generating XML reports... 2023-01-11T21:44:28.7211357Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214235.xml 2023-01-11T21:44:28.7211721Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7211894Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7212267Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7212455Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7212474Z 2023-01-11T21:44:28.7212578Z Running tests... 2023-01-11T21:44:28.7212837Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7213132Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7213410Z test_send_recv_nccl (__main__.TestDistBackendWithSpawn) ... skip: NCCL Send Recv Only (0.002s) 2023-01-11T21:44:28.7213430Z 2023-01-11T21:44:28.7213692Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7213803Z Ran 1 test in 0.002s 2023-01-11T21:44:28.7213823Z 2023-01-11T21:44:28.7213926Z OK (skipped=1) 2023-01-11T21:44:28.7213946Z 2023-01-11T21:44:28.7214068Z Generating XML reports... 2023-01-11T21:44:28.7214508Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214244.xml 2023-01-11T21:44:28.7214871Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7215046Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7215408Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7215599Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7215618Z 2023-01-11T21:44:28.7215724Z Running tests... 2023-01-11T21:44:28.7215982Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7216286Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7216764Z test_send_recv_nccl_autograd_profiler (__main__.TestDistBackendWithSpawn) ... skip: NCCL Send Recv Only (0.002s) 2023-01-11T21:44:28.7216787Z 2023-01-11T21:44:28.7217062Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7217171Z Ran 1 test in 0.002s 2023-01-11T21:44:28.7217192Z 2023-01-11T21:44:28.7217294Z OK (skipped=1) 2023-01-11T21:44:28.7217317Z 2023-01-11T21:44:28.7217419Z Generating XML reports... 2023-01-11T21:44:28.7217862Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214247.xml 2023-01-11T21:44:28.7218230Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7218403Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7218776Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7218963Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7218982Z 2023-01-11T21:44:28.7219089Z Running tests... 2023-01-11T21:44:28.7219348Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7219653Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7219891Z test_send_recv_nccl_torch_profiler (__main__.TestDistBackendWithSpawn) ... skip: NCCL Send Recv Only (0.003s) 2023-01-11T21:44:28.7219991Z 2023-01-11T21:44:28.7220262Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7220376Z Ran 1 test in 0.003s 2023-01-11T21:44:28.7220396Z 2023-01-11T21:44:28.7220502Z OK (skipped=1) 2023-01-11T21:44:28.7220521Z 2023-01-11T21:44:28.7220641Z Generating XML reports... 2023-01-11T21:44:28.7221077Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214249.xml 2023-01-11T21:44:28.7221439Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7221610Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7221965Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7222151Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7222174Z 2023-01-11T21:44:28.7222279Z Running tests... 2023-01-11T21:44:28.7222536Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7222901Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7223173Z test_send_recv_torch_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.7223392Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 28914 2023-01-11T21:44:28.7223606Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 28915 2023-01-11T21:44:28.7223972Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7224129Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7224500Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7224689Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7225044Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7225213Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7225579Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7225763Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7226008Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.7226231Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.7226625Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7227020Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7227247Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.7227578Z STAGE:2023-01-11 21:42:55 28914:28914 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7227801Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.7228122Z STAGE:2023-01-11 21:42:55 28915:28915 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7228393Z [1673473375.904915] [7c5487d9c02b:28915:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7228623Z [1673473377.534643] [7c5487d9c02b:28915:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7228913Z [1673473377.534643] [7c5487d9c02b:28915:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7229238Z STAGE:2023-01-11 21:42:57 28915:28915 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7229512Z [1673473375.904877] [7c5487d9c02b:28914:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7229737Z [1673473377.536062] [7c5487d9c02b:28914:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7229968Z [1673473377.536062] [7c5487d9c02b:28914:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7230305Z STAGE:2023-01-11 21:42:57 28914:28914 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7230648Z STAGE:2023-01-11 21:42:57 28915:28915 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7230990Z STAGE:2023-01-11 21:42:57 28914:28914 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7231091Z ok (6.648s) 2023-01-11T21:44:28.7231155Z 2023-01-11T21:44:28.7231425Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7231519Z Ran 1 test in 6.648s 2023-01-11T21:44:28.7231539Z 2023-01-11T21:44:28.7231629Z OK 2023-01-11T21:44:28.7231648Z 2023-01-11T21:44:28.7231770Z Generating XML reports... 2023-01-11T21:44:28.7232207Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214251.xml 2023-01-11T21:44:28.7232569Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7232738Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7233109Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7233302Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7233322Z 2023-01-11T21:44:28.7233429Z Running tests... 2023-01-11T21:44:28.7233671Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7233981Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7234237Z test_send_recv_with_tag (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.7234452Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29028 2023-01-11T21:44:28.7234722Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29029 2023-01-11T21:44:28.7235092Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7235269Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7235646Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7235821Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7236177Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7236353Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7236720Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7236908Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7237149Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.7237390Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.7237860Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7238253Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7238463Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.7238688Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.7238958Z [1673473385.113229] [7c5487d9c02b:29028:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7239186Z [1673473386.562962] [7c5487d9c02b:29028:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7239419Z [1673473386.562962] [7c5487d9c02b:29028:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7239688Z [1673473385.114896] [7c5487d9c02b:29029:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7239962Z [1673473386.519834] [7c5487d9c02b:29029:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7240202Z [1673473386.519834] [7c5487d9c02b:29029:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7240305Z ok (6.157s) 2023-01-11T21:44:28.7240324Z 2023-01-11T21:44:28.7240592Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7240687Z Ran 1 test in 6.158s 2023-01-11T21:44:28.7240706Z 2023-01-11T21:44:28.7240795Z OK 2023-01-11T21:44:28.7240815Z 2023-01-11T21:44:28.7240936Z Generating XML reports... 2023-01-11T21:44:28.7241377Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214301.xml 2023-01-11T21:44:28.7241748Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7241926Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7242300Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7242485Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7242504Z 2023-01-11T21:44:28.7242595Z Running tests... 2023-01-11T21:44:28.7242849Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7243156Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7243437Z test_send_recv_with_tag_autograd_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.7243653Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29138 2023-01-11T21:44:28.7243869Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29139 2023-01-11T21:44:28.7244236Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7244404Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7244779Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7244949Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7245308Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7245476Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7245837Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7246076Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7246320Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.7246562Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.7246964Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7247340Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7247565Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.7247898Z STAGE:2023-01-11 21:43:13 29138:29138 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7248124Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.7248453Z STAGE:2023-01-11 21:43:13 29139:29139 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7248768Z [1673473393.756453] [7c5487d9c02b:29138:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7249002Z [1673473395.383239] [7c5487d9c02b:29138:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7249239Z [1673473395.383239] [7c5487d9c02b:29138:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7249576Z STAGE:2023-01-11 21:43:15 29138:29138 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7249842Z [1673473393.759065] [7c5487d9c02b:29139:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7250051Z [1673473395.429404] [7c5487d9c02b:29139:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7250288Z [1673473395.429404] [7c5487d9c02b:29139:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7250616Z STAGE:2023-01-11 21:43:15 29139:29139 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7250961Z STAGE:2023-01-11 21:43:15 29138:29138 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7251299Z STAGE:2023-01-11 21:43:15 29139:29139 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7251400Z ok (6.701s) 2023-01-11T21:44:28.7251420Z 2023-01-11T21:44:28.7251680Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7251792Z Ran 1 test in 6.701s 2023-01-11T21:44:28.7251812Z 2023-01-11T21:44:28.7251903Z OK 2023-01-11T21:44:28.7251922Z 2023-01-11T21:44:28.7252031Z Generating XML reports... 2023-01-11T21:44:28.7252472Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214309.xml 2023-01-11T21:44:28.7252841Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7253014Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7253386Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7253574Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7253594Z 2023-01-11T21:44:28.7253700Z Running tests... 2023-01-11T21:44:28.7253999Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7254297Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7254571Z test_send_recv_with_tag_torch_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.7254846Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29252 2023-01-11T21:44:28.7255065Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29253 2023-01-11T21:44:28.7255434Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7255609Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7255979Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7256167Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7256527Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7256982Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7257366Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7257629Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7257885Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.7258125Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.7258527Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7258919Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7259147Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.7259477Z STAGE:2023-01-11 21:43:22 29253:29253 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7259689Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.7260020Z STAGE:2023-01-11 21:43:23 29252:29252 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:44:28.7260293Z [1673473403.029759] [7c5487d9c02b:29252:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7260521Z [1673473404.722489] [7c5487d9c02b:29252:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7260753Z [1673473404.722489] [7c5487d9c02b:29252:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7261090Z STAGE:2023-01-11 21:43:25 29252:29252 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7261356Z [1673473403.029697] [7c5487d9c02b:29253:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7261581Z [1673473404.663041] [7c5487d9c02b:29253:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7261811Z [1673473404.663041] [7c5487d9c02b:29253:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7262141Z STAGE:2023-01-11 21:43:25 29253:29253 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:44:28.7262467Z STAGE:2023-01-11 21:43:25 29253:29253 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7262805Z STAGE:2023-01-11 21:43:25 29252:29252 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:44:28.7262901Z ok (6.736s) 2023-01-11T21:44:28.7262921Z 2023-01-11T21:44:28.7263181Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7263361Z Ran 1 test in 6.736s 2023-01-11T21:44:28.7263381Z 2023-01-11T21:44:28.7263473Z OK 2023-01-11T21:44:28.7263491Z 2023-01-11T21:44:28.7263616Z Generating XML reports... 2023-01-11T21:44:28.7264065Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214319.xml 2023-01-11T21:44:28.7264415Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7264586Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7264958Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7265147Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7265166Z 2023-01-11T21:44:28.7265274Z Running tests... 2023-01-11T21:44:28.7265531Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7265842Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7266166Z test_sparse_all_reduce_sum (__main__.TestDistBackendWithSpawn) ... skip: Only Gloo backend support sparse all reduce (0.002s) 2023-01-11T21:44:28.7266188Z 2023-01-11T21:44:28.7266447Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7266540Z Ran 1 test in 0.002s 2023-01-11T21:44:28.7266558Z 2023-01-11T21:44:28.7266665Z OK (skipped=1) 2023-01-11T21:44:28.7266684Z 2023-01-11T21:44:28.7266807Z Generating XML reports... 2023-01-11T21:44:28.7267242Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214328.xml 2023-01-11T21:44:28.7267609Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7267782Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7268152Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7268344Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7268363Z 2023-01-11T21:44:28.7268474Z Running tests... 2023-01-11T21:44:28.7268718Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7269024Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7269306Z test_sparse_all_reduce_sum_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Gloo backend support sparse all reduce (0.002s) 2023-01-11T21:44:28.7269326Z 2023-01-11T21:44:28.7269585Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7269694Z Ran 1 test in 0.002s 2023-01-11T21:44:28.7269713Z 2023-01-11T21:44:28.7269816Z OK (skipped=1) 2023-01-11T21:44:28.7269835Z 2023-01-11T21:44:28.7269955Z Generating XML reports... 2023-01-11T21:44:28.7270392Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214330.xml 2023-01-11T21:44:28.7270760Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7270916Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7271290Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7271479Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7271499Z 2023-01-11T21:44:28.7271606Z Running tests... 2023-01-11T21:44:28.7271864Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7272169Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7272428Z test_stateless_api_with_ddp (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.7272697Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29432 2023-01-11T21:44:28.7272899Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29433 2023-01-11T21:44:28.7273266Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7273439Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7273809Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7273994Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7274348Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7274518Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7274892Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7275064Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7275351Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.7275597Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.7275993Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7276382Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7276608Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.7276833Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.7277109Z [1673473418.559886] [7c5487d9c02b:29432:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7277340Z [1673473418.574229] [7c5487d9c02b:29432:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7277573Z [1673473418.574229] [7c5487d9c02b:29432:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7277825Z [1673473418.560182] [7c5487d9c02b:29433:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7278051Z [1673473418.574215] [7c5487d9c02b:29433:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7278286Z [1673473418.574215] [7c5487d9c02b:29433:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7278388Z ok (6.642s) 2023-01-11T21:44:28.7278409Z 2023-01-11T21:44:28.7278673Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7278783Z Ran 1 test in 6.642s 2023-01-11T21:44:28.7278803Z 2023-01-11T21:44:28.7278896Z OK 2023-01-11T21:44:28.7278916Z 2023-01-11T21:44:28.7279036Z Generating XML reports... 2023-01-11T21:44:28.7279476Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214333.xml 2023-01-11T21:44:28.7279828Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7280004Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7280378Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7280567Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7280639Z 2023-01-11T21:44:28.7280751Z Running tests... 2023-01-11T21:44:28.7281014Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7281325Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7281583Z test_static_graph_api_cpu (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.7281783Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29550 2023-01-11T21:44:28.7281999Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29551 2023-01-11T21:44:28.7282362Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7282536Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7282908Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7283098Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7283453Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7283668Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7284047Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7284215Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7284455Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.7284696Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.7285090Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7285480Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7285711Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.7285941Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.7286213Z [1673473426.395422] [7c5487d9c02b:29550:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7286443Z [1673473427.857414] [7c5487d9c02b:29550:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7286661Z [1673473427.857414] [7c5487d9c02b:29550:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7286923Z [1673473426.398129] [7c5487d9c02b:29551:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7287148Z [1673473427.833035] [7c5487d9c02b:29551:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7287385Z [1673473427.833035] [7c5487d9c02b:29551:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7287487Z ok (6.239s) 2023-01-11T21:44:28.7287507Z 2023-01-11T21:44:28.7287772Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7287886Z Ran 1 test in 6.239s 2023-01-11T21:44:28.7287906Z 2023-01-11T21:44:28.7287993Z OK 2023-01-11T21:44:28.7288012Z 2023-01-11T21:44:28.7288133Z Generating XML reports... 2023-01-11T21:44:28.7288559Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214342.xml 2023-01-11T21:44:28.7288926Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7289155Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7289531Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7289723Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7289743Z 2023-01-11T21:44:28.7289851Z Running tests... 2023-01-11T21:44:28.7290113Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7290421Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7290718Z test_sync_bn_logged (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl & Gloo backend support DistributedDataParallel (0.002s) 2023-01-11T21:44:28.7290738Z 2023-01-11T21:44:28.7290979Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7291089Z Ran 1 test in 0.002s 2023-01-11T21:44:28.7291108Z 2023-01-11T21:44:28.7291219Z OK (skipped=1) 2023-01-11T21:44:28.7291238Z 2023-01-11T21:44:28.7291357Z Generating XML reports... 2023-01-11T21:44:28.7291879Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214351.xml 2023-01-11T21:44:28.7292255Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7292432Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7292802Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7292985Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7293004Z 2023-01-11T21:44:28.7293094Z Running tests... 2023-01-11T21:44:28.7293351Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7293656Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7293945Z test_undefined_grad_parity_unused_parameters (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.7294164Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29697 2023-01-11T21:44:28.7294378Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29698 2023-01-11T21:44:28.7294745Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7294917Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7295272Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7295460Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7295818Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7295991Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7296364Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7296776Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7297029Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.7297266Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.7297665Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7298037Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7298263Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.7298588Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.7298865Z [1673473438.893139] [7c5487d9c02b:29698:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7299096Z [1673473438.906492] [7c5487d9c02b:29698:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7299331Z [1673473438.906492] [7c5487d9c02b:29698:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7300106Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T21:44:28.7300428Z [1673473438.892627] [7c5487d9c02b:29697:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7300662Z [1673473438.906482] [7c5487d9c02b:29697:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7300891Z [1673473438.906482] [7c5487d9c02b:29697:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7301653Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T21:44:28.7301760Z ok (6.512s) 2023-01-11T21:44:28.7301780Z 2023-01-11T21:44:28.7302044Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7302138Z Ran 1 test in 6.512s 2023-01-11T21:44:28.7302175Z 2023-01-11T21:44:28.7302248Z OK 2023-01-11T21:44:28.7302267Z 2023-01-11T21:44:28.7302388Z Generating XML reports... 2023-01-11T21:44:28.7302826Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214353.xml 2023-01-11T21:44:28.7303193Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7303366Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7303744Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7303938Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7303958Z 2023-01-11T21:44:28.7304068Z Running tests... 2023-01-11T21:44:28.7304309Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7304617Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7304896Z test_verify_model_across_rank_with_logger (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.7305110Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29815 2023-01-11T21:44:28.7305323Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29816 2023-01-11T21:44:28.7305682Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7305900Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7306273Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7306464Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7306807Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7306976Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7307341Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7307529Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7307773Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.7308015Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.7308415Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7308847Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7309064Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.7309288Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.7309524Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:44:28.7309761Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:44:28.7310155Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.7310552Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.7310792Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T21:44:28.7311026Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T21:44:28.7311409Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:44:28.7311794Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:44:28.7312050Z [1673473448.030502] [7c5487d9c02b:29816:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7312278Z [1673473448.043758] [7c5487d9c02b:29816:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7312521Z [1673473448.043758] [7c5487d9c02b:29816:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7312897Z [1673473453.407494] [7c5487d9c02b:29816:0] tag_match.c:62 UCX WARN unexpected tag-receive descriptor 0x34d81ac0 was not matched 2023-01-11T21:44:28.7313163Z [1673473448.024765] [7c5487d9c02b:29815:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7313390Z [1673473448.038282] [7c5487d9c02b:29815:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7313620Z [1673473448.038282] [7c5487d9c02b:29815:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7313923Z [1673473453.376948] [7c5487d9c02b:29815:1] ucc_schedule.h:189 UCC WARN timeout 5 sec. has expired on req 0x31b34080, seq_num 5, TL_UCP, team_id 1, size 2, rank 0, ctx_rank 0: Barrier n/a inplace=0 bytes=0 2023-01-11T21:44:28.7314245Z [1673473453.417578] [7c5487d9c02b:29815:0] mpool.c:55 UCX WARN object 0x31c45580 {flags:0x20040 recv length 0 host memory} was not returned to mpool ucp_requests 2023-01-11T21:44:28.7314350Z ok (11.232s) 2023-01-11T21:44:28.7314370Z 2023-01-11T21:44:28.7314619Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7314731Z Ran 1 test in 11.232s 2023-01-11T21:44:28.7314750Z 2023-01-11T21:44:28.7314840Z OK 2023-01-11T21:44:28.7314859Z 2023-01-11T21:44:28.7314981Z Generating XML reports... 2023-01-11T21:44:28.7315422Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214402.xml 2023-01-11T21:44:28.7315787Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7315962Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7316340Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7316575Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7316596Z 2023-01-11T21:44:28.7316692Z Running tests... 2023-01-11T21:44:28.7316951Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7317258Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:44:28.7317539Z test_verify_model_across_rank_without_logger (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:44:28.7317756Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29935 2023-01-11T21:44:28.7317970Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29936 2023-01-11T21:44:28.7318337Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7318514Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7318872Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7319060Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7319419Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T21:44:28.7319587Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:44:28.7319957Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:44:28.7320142Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:44:28.7320384Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:44:28.7320627Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:44:28.7321024Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7321396Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:44:28.7321625Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:44:28.7321848Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:44:28.7322083Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:44:28.7322320Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:44:28.7322710Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.7323148Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:44:28.7323388Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T21:44:28.7323619Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T21:44:28.7323982Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:44:28.7324369Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:44:28.7324640Z [1673473461.845950] [7c5487d9c02b:29935:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7324865Z [1673473461.859914] [7c5487d9c02b:29935:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7325148Z [1673473461.859914] [7c5487d9c02b:29935:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7325466Z [1673473467.192296] [7c5487d9c02b:29935:1] ucc_schedule.h:189 UCC WARN timeout 5 sec. has expired on req 0x34eaa240, seq_num 5, TL_UCP, team_id 1, size 2, rank 0, ctx_rank 0: Barrier n/a inplace=0 bytes=0 2023-01-11T21:44:28.7325739Z [1673473467.228683] [7c5487d9c02b:29935:0] mpool.c:55 UCX WARN object 0x34fdf140 {flags:0x20040 recv length 0 host memory} was not returned to mpool ucp_requests 2023-01-11T21:44:28.7326009Z [1673473461.845977] [7c5487d9c02b:29936:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:44:28.7326234Z [1673473461.859941] [7c5487d9c02b:29936:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:44:28.7326467Z [1673473461.859941] [7c5487d9c02b:29936:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:44:28.7326847Z [1673473467.238716] [7c5487d9c02b:29936:0] tag_match.c:62 UCX WARN unexpected tag-receive descriptor 0x36bd6200 was not matched 2023-01-11T21:44:28.7326932Z ok (11.257s) 2023-01-11T21:44:28.7326952Z 2023-01-11T21:44:28.7327217Z ---------------------------------------------------------------------- 2023-01-11T21:44:28.7327330Z Ran 1 test in 11.257s 2023-01-11T21:44:28.7327349Z 2023-01-11T21:44:28.7327440Z OK 2023-01-11T21:44:28.7327460Z 2023-01-11T21:44:28.7327583Z Generating XML reports... 2023-01-11T21:44:28.7328026Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214416.xml 2023-01-11T21:44:28.7328045Z 2023-01-11T21:44:28.7328516Z ##[endgroup] 2023-01-11T21:44:28.7328977Z FINISHED PRINTING LOG FILE of distributed/test_distributed_spawn (/var/lib/jenkins/workspace/test/test-reports/distributed-test_distributed_spawn_fsaba4i5) 2023-01-11T21:44:28.7329001Z 2023-01-11T21:44:28.7329198Z Running distributed tests for the ucc backend with file init_method in shard 3 of 3 2023-01-11T21:44:28.7329711Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:44:28.462470] 2023-01-11T22:10:24.3663106Z 2023-01-11T22:10:24.3663820Z Expand the folded group to see the log file of distributed/test_distributed_spawn 2023-01-11T22:10:24.3665498Z ##[group]PRINTING LOG FILE of distributed/test_distributed_spawn (/var/lib/jenkins/workspace/test/test-reports/distributed-test_distributed_spawn_q8dvsh36) 2023-01-11T22:10:24.3666369Z 2023-01-11T22:10:24.3708040Z , <__main__.TestDistBackendWithSpawn testMethod=test_3_level_hierarchical_model_averager>, <__main__.TestDistBackendWithSpawn testMethod=test_Backend_enum_class>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallelCPU>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallelCPU_grad_is_view>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_SyncBatchNorm>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_SyncBatchNorm_2D_Input>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_SyncBatchNorm_Channels_Last>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_SyncBatchNorm_No_Affine>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_SyncBatchNorm_Single_Input_Per_Process>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_non_default_stream>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_requires_grad>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_with_amp_and_grad_is_view>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedSampler_padding>, <__main__.TestDistBackendWithSpawn testMethod=test_SyncBatchNorm_process_group>, <__main__.TestDistBackendWithSpawn testMethod=test_accumulate_gradients_no_sync>, <__main__.TestDistBackendWithSpawn testMethod=test_accumulate_gradients_no_sync_allreduce_hook>, <__main__.TestDistBackendWithSpawn testMethod=test_accumulate_gradients_no_sync_allreduce_with_then_hook>, <__main__.TestDistBackendWithSpawn testMethod=test_accumulate_gradients_no_sync_grad_is_view>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_coalesced_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_coalesced_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_coalesced_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_coalesced_simple>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_coalesced_with_empty>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_cuda_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_into_cat_tensor_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_into_stack_tensor_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_multigpu>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_multigpu_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_object_default_pg>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_object_subgroup>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_v_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_full_group_max>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_full_group_min>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_full_group_product>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_full_group_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_group_max>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_group_min>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_group_product>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_group_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_max>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_max_complex_unsupported>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_min>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_product>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_complex_unsupported_ops>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_full_group_max>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_full_group_min>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_full_group_product>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_full_group_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_group_max>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_group_min>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_group_product>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_group_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_max>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_min>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_multigpu>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_multigpu_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_product>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_result_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_sum_async>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_sum_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_sum_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_sum_cuda_async>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_sum_cuda_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_cuda_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_full_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split_cuda_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split_full_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split_cuda_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split_full_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_average_parameters>, <__main__.TestDistBackendWithSpawn testMethod=test_backend_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_backend_group>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_full_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_group>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_timeout_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_timeout_global>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_timeout_group>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_gloo>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_gloo_tags>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_mixed_backend_err>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_nccl>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_no_rank_zero_nccl>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_op_err>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_op_list_err>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_ring_exchange_nccl>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_self_nccl>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_tensor_err>, <__main__.TestDistBackendWithSpawn testMethod=test_broadcast>, <__main__.TestDistBackendWithSpawn testMethod=test_broadcast_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_broadcast_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_broadcast_group>, <__main__.TestDistBackendWithSpawn testMethod=test_broadcast_multigpu>, <__main__.TestDistBackendWithSpawn testMethod=test_broadcast_object_list>, <__main__.TestDistBackendWithSpawn testMethod=test_compute_bucket_assignment_by_size_sparse_error_with_logger>, <__main__.TestDistBackendWithSpawn testMethod=test_compute_bucket_assignment_by_size_sparse_error_without_logger>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_apply_optim_in_backward>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_apply_optim_in_backward_grad_as_bucket_view_false>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_apply_optim_in_backward_ignored_params>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_broadcast_buffer>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_broadcast_buffer_via_hook>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_buffer_hook_allreduce>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_buffer_hook_allreduce_return_future>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_build_debug_param_to_name_mapping>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_build_debug_param_to_name_mapping_requires_grad>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_comm_hook_logging>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_control_flow_different_across_ranks>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_control_flow_same_across_ranks>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_create_graph>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_device>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_forward_backward_hook>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_grad_div_uneven_inputs>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_parity_allreduce>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_parity_allreduce_process_group>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_parity_post_localSGD>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_parity_powerSGD>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_pickling_powerSGD>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adam_optimize_subset_False>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adam_optimize_subset_True>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_False_optimize_subset_False>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_False_optimize_subset_True>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_True_optimize_subset_False>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_True_optimize_subset_True>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_False_optimize_subset_False>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_False_optimize_subset_True>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_True_optimize_subset_False>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_True_optimize_subset_True>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_sgd_optimize_subset_False>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_sgd_optimize_subset_True>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_ignore_params_arg>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_inference>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_join_model_equivalence>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_logging_data_cpu>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_logging_data_gpu>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_model_diff_num_params_across_ranks>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_model_diff_shape_across_ranks>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_multiple_nested_unused_params_err_ignore_params>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_multiple_nested_unused_params_error>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_namedtuple>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_new_tensor_in_fwd>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_new_tensor_in_fwd_static_graph>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_profiling_autograd_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_profiling_torch_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_python_error_logged>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_returns_tensor_with_no_grad>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_shared_grad_acc_unused_params>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_static_graph_nested_types>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_sync_bn_training_vs_eval>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_sync_module_states>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_uneven_input_exception>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_uneven_input_join_disable>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_uneven_inputs>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_uneven_inputs_stop_iteration_sync_bn>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_unused_params_rebuild_buckets_exception>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_zero_output_features>, <__main__.TestDistBackendWithSpawn testMethod=test_destroy_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_destroy_group>, <__main__.TestDistBackendWithSpawn testMethod=test_detect_ddp_is_actually_static>, <__main__.TestDistBackendWithSpawn testMethod=test_different_graph_across_ranks>, <__main__.TestDistBackendWithSpawn testMethod=test_dump_DDP_relevant_env_vars>, <__main__.TestDistBackendWithSpawn testMethod=test_gather>, <__main__.TestDistBackendWithSpawn testMethod=test_gather_checks>, <__main__.TestDistBackendWithSpawn testMethod=test_gather_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_gather_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_gather_group>, <__main__.TestDistBackendWithSpawn testMethod=test_gather_object>, <__main__.TestDistBackendWithSpawn testMethod=test_gather_object_subgroup>, <__main__.TestDistBackendWithSpawn testMethod=test_get_backend>, <__main__.TestDistBackendWithSpawn testMethod=test_get_future>, <__main__.TestDistBackendWithSpawn testMethod=test_get_rank>, <__main__.TestDistBackendWithSpawn testMethod=test_get_rank_size_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_get_rank_size_group>, <__main__.TestDistBackendWithSpawn testMethod=test_invalid_static_graph>, <__main__.TestDistBackendWithSpawn testMethod=test_irecv>, <__main__.TestDistBackendWithSpawn testMethod=test_isend>, <__main__.TestDistBackendWithSpawn testMethod=test_isend_autograd_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_isend_torch_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_monitored_barrier_allreduce_hang>, <__main__.TestDistBackendWithSpawn testMethod=test_monitored_barrier_allreduce_hang_wait_all_ranks>, <__main__.TestDistBackendWithSpawn testMethod=test_monitored_barrier_failure_order>, <__main__.TestDistBackendWithSpawn testMethod=test_monitored_barrier_gloo>, <__main__.TestDistBackendWithSpawn testMethod=test_monitored_barrier_gloo_rank_0_timeout>, <__main__.TestDistBackendWithSpawn testMethod=test_monitored_barrier_gloo_subgroup>, <__main__.TestDistBackendWithSpawn testMethod=test_monitored_barrier_wait_all_ranks>, <__main__.TestDistBackendWithSpawn testMethod=test_nccl_backend_bool_allgather>, <__main__.TestDistBackendWithSpawn testMethod=test_nccl_backend_bool_allreduce>, <__main__.TestDistBackendWithSpawn testMethod=test_nccl_backend_bool_broadcast>, <__main__.TestDistBackendWithSpawn testMethod=test_nccl_backend_bool_reduce>, <__main__.TestDistBackendWithSpawn testMethod=test_nccl_high_priority_stream>, <__main__.TestDistBackendWithSpawn testMethod=test_new_subgroups>, <__main__.TestDistBackendWithSpawn testMethod=test_new_subgroups_by_enumeration>, <__main__.TestDistBackendWithSpawn testMethod=test_new_subgroups_by_enumeration_input_rank_exceeds_world_size>, <__main__.TestDistBackendWithSpawn testMethod=test_new_subgroups_by_enumeration_negative_input_rank>, <__main__.TestDistBackendWithSpawn testMethod=test_new_subgroups_group_size_exceeds_world_size>, <__main__.TestDistBackendWithSpawn testMethod=test_new_subgroups_overlap_not_allowed>, <__main__.TestDistBackendWithSpawn testMethod=test_new_subgroups_world_size_not_divisible_by_group_size>, <__main__.TestDistBackendWithSpawn testMethod=test_output_unused_in_loss_dict_module>, <__main__.TestDistBackendWithSpawn testMethod=test_output_unused_in_loss_tuple_module>, <__main__.TestDistBackendWithSpawn testMethod=test_periodic_model_averager>, <__main__.TestDistBackendWithSpawn testMethod=test_periodic_model_averager_param_group>, <__main__.TestDistBackendWithSpawn testMethod=test_post_localSGD_optimizer_parity>, <__main__.TestDistBackendWithSpawn testMethod=test_post_localSGD_optimizer_parity_grad_is_view>, <__main__.TestDistBackendWithSpawn testMethod=test_post_localSGD_optimizer_parity_with_hierarchical_sgd>, <__main__.TestDistBackendWithSpawn testMethod=test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view>, <__main__.TestDistBackendWithSpawn testMethod=test_post_localSGD_optimizer_step_reload>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_full_group_max>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_full_group_min>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_full_group_product>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_full_group_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_group_max>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_group_min>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_group_product>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_group_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_max>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_min>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_multigpu>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_product>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_scatter_tensor_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_scatter_v_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_sum_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_sum_cuda_twice>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_sum_twice>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter_checks>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter_cuda_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter_group>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter_object_list>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_any_source>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_any_source_autograd_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_any_source_torch_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_autograd_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_nccl>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_nccl_autograd_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_nccl_torch_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_torch_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_with_tag>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_with_tag_autograd_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_with_tag_torch_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_sparse_all_reduce_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_sparse_all_reduce_sum_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_stateless_api_with_ddp>, <__main__.TestDistBackendWithSpawn testMethod=test_static_graph_api_cpu>, <__main__.TestDistBackendWithSpawn testMethod=test_sync_bn_logged>, <__main__.TestDistBackendWithSpawn testMethod=test_undefined_grad_parity_unused_parameters>, <__main__.TestDistBackendWithSpawn testMethod=test_verify_model_across_rank_with_logger>, <__main__.TestDistBackendWithSpawn testMethod=test_verify_model_across_rank_without_logger>]> 2023-01-11T22:10:24.3749132Z test_1_level_hierarchical_model_averager_equivalent_to_periodic_model_averager (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3751417Z test_3_level_hierarchical_model_averager (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3752024Z test_Backend_enum_class (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3752602Z test_DistributedDataParallel (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3753048Z test_DistributedDataParallelCPU (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3753693Z test_DistributedDataParallelCPU_grad_is_view (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3754347Z test_DistributedDataParallel_SyncBatchNorm (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3754906Z test_DistributedDataParallel_SyncBatchNorm_2D_Input (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3755506Z test_DistributedDataParallel_SyncBatchNorm_Channels_Last (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3756376Z test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3757049Z test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3757758Z test_DistributedDataParallel_SyncBatchNorm_No_Affine (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3758499Z test_DistributedDataParallel_SyncBatchNorm_Single_Input_Per_Process (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3759181Z test_DistributedDataParallel_non_default_stream (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3759776Z test_DistributedDataParallel_requires_grad (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3760298Z test_DistributedDataParallel_with_amp_and_grad_is_view (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3760941Z test_DistributedSampler_padding (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3761538Z test_SyncBatchNorm_process_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3762012Z test_accumulate_gradients_no_sync (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3762753Z test_accumulate_gradients_no_sync_allreduce_hook (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3763475Z test_accumulate_gradients_no_sync_allreduce_with_then_hook (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3764111Z test_accumulate_gradients_no_sync_grad_is_view (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3764515Z test_all_gather (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3765082Z test_all_gather_coalesced_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3765613Z test_all_gather_coalesced_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3766114Z test_all_gather_coalesced_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3766676Z test_all_gather_coalesced_simple (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3767152Z test_all_gather_coalesced_with_empty (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3767712Z test_all_gather_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3768235Z test_all_gather_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3768635Z test_all_gather_cuda_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3769342Z test_all_gather_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3770081Z test_all_gather_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3770775Z test_all_gather_into_cat_tensor_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3771577Z test_all_gather_into_stack_tensor_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3772356Z test_all_gather_multigpu (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3773093Z test_all_gather_multigpu_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3773892Z test_all_gather_object_default_pg (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3774712Z test_all_gather_object_subgroup (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3775453Z test_all_gather_v_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3776182Z test_all_reduce_coalesced_full_group_max (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3777687Z test_all_reduce_coalesced_full_group_min (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3778581Z test_all_reduce_coalesced_full_group_product (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3779381Z test_all_reduce_coalesced_full_group_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3780195Z test_all_reduce_coalesced_group_max (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3781019Z test_all_reduce_coalesced_group_min (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3781823Z test_all_reduce_coalesced_group_product (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3782594Z test_all_reduce_coalesced_group_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3783546Z test_all_reduce_coalesced_max (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3784367Z test_all_reduce_coalesced_max_complex_unsupported (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3785158Z test_all_reduce_coalesced_min (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3785970Z test_all_reduce_coalesced_product (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3786749Z test_all_reduce_coalesced_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3787563Z test_all_reduce_complex_unsupported_ops (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3788345Z test_all_reduce_full_group_max (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3789110Z test_all_reduce_full_group_min (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3789848Z test_all_reduce_full_group_product (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3790590Z test_all_reduce_full_group_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3791339Z test_all_reduce_group_max (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3792017Z test_all_reduce_group_min (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3792762Z test_all_reduce_group_product (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3793573Z test_all_reduce_group_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3794295Z test_all_reduce_max (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3794966Z test_all_reduce_min (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3795643Z test_all_reduce_multigpu (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3796406Z test_all_reduce_multigpu_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3797135Z test_all_reduce_product (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3797884Z test_all_reduce_result_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3798644Z test_all_reduce_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3799340Z test_all_reduce_sum_async (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3800076Z test_all_reduce_sum_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3800754Z test_all_reduce_sum_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3801493Z test_all_reduce_sum_cuda_async (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3802197Z test_all_reduce_sum_cuda_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3802887Z test_all_to_all (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3803588Z test_all_to_all_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3804307Z test_all_to_all_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3804991Z test_all_to_all_cuda_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3805712Z test_all_to_all_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3806446Z test_all_to_all_full_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3807246Z test_all_to_all_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3807927Z test_all_to_all_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3808679Z test_all_to_all_single_equal_split (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3809479Z test_all_to_all_single_equal_split_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3810254Z test_all_to_all_single_equal_split_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3811053Z test_all_to_all_single_equal_split_cuda_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3811918Z test_all_to_all_single_equal_split_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3812791Z test_all_to_all_single_equal_split_full_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3813624Z test_all_to_all_single_equal_split_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3814476Z test_all_to_all_single_equal_split_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3815246Z test_all_to_all_single_unequal_split (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3816183Z test_all_to_all_single_unequal_split_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3817695Z test_all_to_all_single_unequal_split_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3818530Z test_all_to_all_single_unequal_split_cuda_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3819404Z test_all_to_all_single_unequal_split_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3820249Z test_all_to_all_single_unequal_split_full_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3821056Z test_all_to_all_single_unequal_split_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3821853Z test_all_to_all_single_unequal_split_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3822668Z test_average_parameters (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3823363Z test_backend_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3824017Z test_backend_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3824684Z test_barrier (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3825337Z test_barrier_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3826165Z test_barrier_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3826846Z test_barrier_full_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3827563Z test_barrier_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3828236Z test_barrier_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3828932Z test_barrier_timeout_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3829692Z test_barrier_timeout_global (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3830397Z test_barrier_timeout_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3831120Z test_batch_isend_irecv_gloo (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3831800Z test_batch_isend_irecv_gloo_tags (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3832550Z test_batch_isend_irecv_mixed_backend_err (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3833337Z test_batch_isend_irecv_nccl (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3834059Z test_batch_isend_irecv_no_rank_zero_nccl (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3834912Z test_batch_isend_irecv_op_err (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3835732Z test_batch_isend_irecv_op_list_err (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3836370Z test_batch_isend_irecv_ring_exchange_nccl (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3836775Z test_batch_isend_irecv_self_nccl (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3837184Z test_batch_isend_irecv_tensor_err (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3837568Z test_broadcast (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3837921Z test_broadcast_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3838306Z test_broadcast_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3838695Z test_broadcast_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3839060Z test_broadcast_multigpu (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3839456Z test_broadcast_object_list (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3839905Z test_compute_bucket_assignment_by_size_sparse_error_with_logger (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3840412Z test_compute_bucket_assignment_by_size_sparse_error_without_logger (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3840853Z test_ddp_apply_optim_in_backward (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3841311Z test_ddp_apply_optim_in_backward_grad_as_bucket_view_false (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3870932Z test_ddp_apply_optim_in_backward_ignored_params (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3871449Z test_ddp_broadcast_buffer (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3872038Z test_ddp_broadcast_buffer_via_hook (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3872454Z test_ddp_buffer_hook_allreduce (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3872875Z test_ddp_buffer_hook_allreduce_return_future (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3873328Z test_ddp_build_debug_param_to_name_mapping (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3873792Z test_ddp_build_debug_param_to_name_mapping_requires_grad (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3874229Z test_ddp_comm_hook_logging (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3874631Z test_ddp_control_flow_different_across_ranks (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3875066Z test_ddp_control_flow_same_across_ranks (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3875470Z test_ddp_create_graph (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3875822Z test_ddp_device (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3876213Z test_ddp_forward_backward_hook (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3876622Z test_ddp_grad_div_uneven_inputs (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3877100Z test_ddp_hook_parity_allreduce (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3877525Z test_ddp_hook_parity_allreduce_process_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3877959Z test_ddp_hook_parity_post_localSGD (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3878371Z test_ddp_hook_parity_powerSGD (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3878763Z test_ddp_hook_pickling_powerSGD (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3879219Z test_ddp_hook_with_optimizer_parity_adam_optimize_subset_False (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3879717Z test_ddp_hook_with_optimizer_parity_adam_optimize_subset_True (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3880275Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_False_optimize_subset_False (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3880881Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_False_optimize_subset_True (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3881482Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_True_optimize_subset_False (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3882079Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_True_optimize_subset_True (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3882680Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_False_optimize_subset_False (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3883264Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_False_optimize_subset_True (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3883856Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_True_optimize_subset_False (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3884459Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_True_optimize_subset_True (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3884996Z test_ddp_hook_with_optimizer_parity_sgd_optimize_subset_False (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3885466Z test_ddp_hook_with_optimizer_parity_sgd_optimize_subset_True (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3885903Z test_ddp_ignore_params_arg (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3886285Z test_ddp_inference (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3886662Z test_ddp_join_model_equivalence (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3887067Z test_ddp_logging_data_cpu (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3887458Z test_ddp_logging_data_gpu (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3887876Z test_ddp_model_diff_num_params_across_ranks (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3888357Z test_ddp_model_diff_shape_across_ranks (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3888820Z test_ddp_multiple_nested_unused_params_err_ignore_params (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3889282Z test_ddp_multiple_nested_unused_params_error (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3889675Z test_ddp_namedtuple (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3890061Z test_ddp_new_tensor_in_fwd (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3890470Z test_ddp_new_tensor_in_fwd_static_graph (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3890900Z test_ddp_profiling_autograd_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3891306Z test_ddp_profiling_torch_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3891716Z test_ddp_python_error_logged (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3892122Z test_ddp_returns_tensor_with_no_grad (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3892536Z test_ddp_shared_grad_acc_unused_params (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3892963Z test_ddp_static_graph_nested_types (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3893443Z test_ddp_sync_bn_training_vs_eval (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3893841Z test_ddp_sync_module_states (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3894244Z test_ddp_uneven_input_exception (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3894658Z test_ddp_uneven_input_join_disable (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3895058Z test_ddp_uneven_inputs (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3895455Z test_ddp_uneven_inputs_stop_iteration_sync_bn (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3895908Z test_ddp_unused_params_rebuild_buckets_exception (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3896341Z test_ddp_zero_output_features (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3897400Z test_destroy_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3897788Z test_destroy_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3898193Z test_detect_ddp_is_actually_static (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3898608Z test_different_graph_across_ranks (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3899002Z test_dump_DDP_relevant_env_vars (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3899377Z test_gather (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3899738Z test_gather_checks (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3900085Z test_gather_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3900459Z test_gather_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3900835Z test_gather_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3901184Z test_gather_object (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3901565Z test_gather_object_subgroup (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3901951Z test_get_backend (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3902295Z test_get_future (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3902650Z test_get_rank (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3903026Z test_get_rank_size_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3903417Z test_get_rank_size_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3903790Z test_invalid_static_graph (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3904154Z test_irecv (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3904498Z test_isend (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3904857Z test_isend_autograd_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3905255Z test_isend_torch_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3905662Z test_monitored_barrier_allreduce_hang (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3906207Z test_monitored_barrier_allreduce_hang_wait_all_ranks (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3906654Z test_monitored_barrier_failure_order (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3907069Z test_monitored_barrier_gloo (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3907547Z test_monitored_barrier_gloo_rank_0_timeout (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3907963Z test_monitored_barrier_gloo_subgroup (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3908387Z test_monitored_barrier_wait_all_ranks (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3908808Z test_nccl_backend_bool_allgather (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3909202Z test_nccl_backend_bool_allreduce (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3909607Z test_nccl_backend_bool_broadcast (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3910013Z test_nccl_backend_bool_reduce (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3910422Z test_nccl_high_priority_stream (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3910792Z test_new_subgroups (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3911257Z test_new_subgroups_by_enumeration (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3911734Z test_new_subgroups_by_enumeration_input_rank_exceeds_world_size (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3912204Z test_new_subgroups_by_enumeration_negative_input_rank (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3912670Z test_new_subgroups_group_size_exceeds_world_size (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3913117Z test_new_subgroups_overlap_not_allowed (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3913572Z test_new_subgroups_world_size_not_divisible_by_group_size (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3914006Z test_output_unused_in_loss_dict_module (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3914435Z test_output_unused_in_loss_tuple_module (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3914855Z test_periodic_model_averager (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3915259Z test_periodic_model_averager_param_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3915693Z test_post_localSGD_optimizer_parity (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3916135Z test_post_localSGD_optimizer_parity_grad_is_view (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3916590Z test_post_localSGD_optimizer_parity_with_hierarchical_sgd (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3917096Z test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3917572Z test_post_localSGD_optimizer_step_reload (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3917985Z test_reduce_full_group_max (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3918358Z test_reduce_full_group_min (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3918749Z test_reduce_full_group_product (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3919153Z test_reduce_full_group_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3919517Z test_reduce_group_max (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3919898Z test_reduce_group_min (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3920276Z test_reduce_group_product (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3920660Z test_reduce_group_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3921006Z test_reduce_max (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3921363Z test_reduce_min (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3921733Z test_reduce_multigpu (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3922092Z test_reduce_product (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3922481Z test_reduce_scatter_tensor_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3922880Z test_reduce_scatter_v_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3923301Z test_reduce_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3923669Z test_reduce_sum_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3924057Z test_reduce_sum_cuda_twice (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3924440Z test_reduce_sum_twice (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3924782Z test_scatter (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3925145Z test_scatter_checks (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3925517Z test_scatter_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3925866Z test_scatter_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3926244Z test_scatter_cuda_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3926629Z test_scatter_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3926988Z test_scatter_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3927367Z test_scatter_object_list (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3927736Z test_send_recv (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3928089Z test_send_recv_any_source (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3928554Z test_send_recv_any_source_autograd_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3929007Z test_send_recv_any_source_torch_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3929428Z test_send_recv_autograd_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3929802Z test_send_recv_nccl (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3930199Z test_send_recv_nccl_autograd_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3930617Z test_send_recv_nccl_torch_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3931007Z test_send_recv_torch_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3931395Z test_send_recv_with_tag (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3931803Z test_send_recv_with_tag_autograd_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3932236Z test_send_recv_with_tag_torch_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3932630Z test_sparse_all_reduce_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3933027Z test_sparse_all_reduce_sum_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3933426Z test_stateless_api_with_ddp (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3933803Z test_static_graph_api_cpu (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3934179Z test_sync_bn_logged (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3934591Z test_undefined_grad_parity_unused_parameters (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3935018Z test_verify_model_across_rank_with_logger (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3935458Z test_verify_model_across_rank_without_logger (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.3936180Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.3937097Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.3937705Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.3938168Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.3938394Z 2023-01-11T22:10:24.3938504Z Running tests... 2023-01-11T22:10:24.3938892Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.3939410Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.3939988Z test_1_level_hierarchical_model_averager_equivalent_to_periodic_model_averager (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.3940542Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30088 2023-01-11T22:10:24.3941081Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30089 2023-01-11T22:10:24.3941684Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.3942131Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.3942699Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.3943146Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.3943708Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.3944144Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.3944682Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.3945139Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.3945585Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.3946161Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.3946816Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.3947492Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.3948007Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.3948471Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.3948971Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T22:10:24.3949795Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T22:10:24.3950459Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T22:10:24.3951265Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T22:10:24.3951892Z [1673473479.016519] [7c5487d9c02b:30088:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.3952432Z [1673473479.017562] [7c5487d9c02b:30089:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.3952926Z [1673473479.030402] [7c5487d9c02b:30088:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.3953393Z [1673473479.030402] [7c5487d9c02b:30088:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.3953839Z [1673473479.030229] [7c5487d9c02b:30089:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.3954295Z [1673473479.030229] [7c5487d9c02b:30089:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.3954812Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T22:10:24.3955633Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T22:10:24.3956272Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T22:10:24.3957078Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T22:10:24.3957788Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T22:10:24.3958594Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T22:10:24.3959223Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T22:10:24.3960027Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T22:10:24.3960668Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T22:10:24.3961465Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T22:10:24.3962148Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T22:10:24.3963035Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T22:10:24.3963508Z ok (7.247s) 2023-01-11T22:10:24.3963656Z 2023-01-11T22:10:24.3963924Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.3964229Z Ran 1 test in 7.247s 2023-01-11T22:10:24.3964388Z 2023-01-11T22:10:24.3964480Z OK 2023-01-11T22:10:24.3964614Z 2023-01-11T22:10:24.3964737Z Generating XML reports... 2023-01-11T22:10:24.3965313Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214432.xml 2023-01-11T22:10:24.3966018Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.3966465Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.3967030Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.3967474Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.3967700Z 2023-01-11T22:10:24.3967808Z Running tests... 2023-01-11T22:10:24.3968203Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.3968720Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.3969222Z test_3_level_hierarchical_model_averager (__main__.TestDistBackendWithSpawn) ... skip: Test requires world size of 4 (0.003s) 2023-01-11T22:10:24.3969525Z 2023-01-11T22:10:24.3969785Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.3970109Z Ran 1 test in 0.004s 2023-01-11T22:10:24.3970269Z 2023-01-11T22:10:24.3970359Z OK (skipped=1) 2023-01-11T22:10:24.3970511Z 2023-01-11T22:10:24.3970636Z Generating XML reports... 2023-01-11T22:10:24.3971227Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214442.xml 2023-01-11T22:10:24.3971926Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.3972353Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.3972916Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.3973375Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.3973601Z 2023-01-11T22:10:24.3973691Z Running tests... 2023-01-11T22:10:24.3974089Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.3974683Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.3975186Z test_Backend_enum_class (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.3975653Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30236 2023-01-11T22:10:24.3976098Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30237 2023-01-11T22:10:24.3977215Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.3977664Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.3978245Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.3978708Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.3979280Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.3979696Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.3980343Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.3980818Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.3981267Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.3981738Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.3982388Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.3983066Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.3983568Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.3984034Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.3984366Z ok (4.242s) 2023-01-11T22:10:24.3984514Z 2023-01-11T22:10:24.3984781Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.3985086Z Ran 1 test in 4.242s 2023-01-11T22:10:24.3985243Z 2023-01-11T22:10:24.3985338Z OK 2023-01-11T22:10:24.3985468Z 2023-01-11T22:10:24.3985590Z Generating XML reports... 2023-01-11T22:10:24.3986167Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214444.xml 2023-01-11T22:10:24.3986869Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.3987304Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.3987865Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.3988312Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.3988539Z 2023-01-11T22:10:24.3988647Z Running tests... 2023-01-11T22:10:24.3989043Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.3989543Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.3990060Z test_DistributedDataParallel (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.3991100Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77317 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.589s) 2023-01-11T22:10:24.3991704Z 2023-01-11T22:10:24.3991969Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.3992292Z Ran 1 test in 1.589s 2023-01-11T22:10:24.3992452Z 2023-01-11T22:10:24.3992546Z OK (skipped=1) 2023-01-11T22:10:24.3992699Z 2023-01-11T22:10:24.3992821Z Generating XML reports... 2023-01-11T22:10:24.3993404Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214451.xml 2023-01-11T22:10:24.3994104Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.3994533Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.3995096Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.3995558Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.3995784Z 2023-01-11T22:10:24.3995878Z Running tests... 2023-01-11T22:10:24.3996277Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.3996844Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.3997380Z test_DistributedDataParallelCPU (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.3997872Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30373 2023-01-11T22:10:24.3998316Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30374 2023-01-11T22:10:24.3998913Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.3999338Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.3999903Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4000366Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4000932Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4001352Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4001911Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4002365Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4002795Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4003289Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4003933Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4004609Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4005112Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4005576Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4006043Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4006520Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4007021Z [1673473499.723196] [7c5487d9c02b:30374:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4007577Z [1673473501.163678] [7c5487d9c02b:30374:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4008048Z [1673473501.163678] [7c5487d9c02b:30374:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4008624Z [1673473499.702770] [7c5487d9c02b:30373:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4009100Z [1673473501.127443] [7c5487d9c02b:30373:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4009555Z [1673473501.127443] [7c5487d9c02b:30373:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4009890Z ok (6.142s) 2023-01-11T22:10:24.4010036Z 2023-01-11T22:10:24.4010310Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4010615Z Ran 1 test in 6.142s 2023-01-11T22:10:24.4010773Z 2023-01-11T22:10:24.4010867Z OK 2023-01-11T22:10:24.4011000Z 2023-01-11T22:10:24.4011121Z Generating XML reports... 2023-01-11T22:10:24.4011699Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214455.xml 2023-01-11T22:10:24.4012404Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4012901Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4013472Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4013916Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4014146Z 2023-01-11T22:10:24.4014256Z Running tests... 2023-01-11T22:10:24.4014653Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4015158Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4015701Z test_DistributedDataParallelCPU_grad_is_view (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4016225Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30487 2023-01-11T22:10:24.4017124Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30488 2023-01-11T22:10:24.4017757Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4018202Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4018764Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4019205Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4019769Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4020207Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4020767Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4021209Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4021657Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4022149Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4022794Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4023455Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4023965Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4024427Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4024880Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4025462Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4025983Z [1673473508.553836] [7c5487d9c02b:30487:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4026484Z [1673473509.975791] [7c5487d9c02b:30487:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4026933Z [1673473509.975791] [7c5487d9c02b:30487:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4027437Z [1673473508.574807] [7c5487d9c02b:30488:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4027926Z [1673473509.988190] [7c5487d9c02b:30488:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4028385Z [1673473509.988190] [7c5487d9c02b:30488:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4028700Z ok (6.254s) 2023-01-11T22:10:24.4028848Z 2023-01-11T22:10:24.4029181Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4029513Z Ran 1 test in 6.254s 2023-01-11T22:10:24.4029672Z 2023-01-11T22:10:24.4029764Z OK 2023-01-11T22:10:24.4029878Z 2023-01-11T22:10:24.4029999Z Generating XML reports... 2023-01-11T22:10:24.4030596Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214504.xml 2023-01-11T22:10:24.4031293Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4031718Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4032280Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4032743Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4032968Z 2023-01-11T22:10:24.4033075Z Running tests... 2023-01-11T22:10:24.4033460Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4033979Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4034519Z test_DistributedDataParallel_SyncBatchNorm (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4035022Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30601 2023-01-11T22:10:24.4035465Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30602 2023-01-11T22:10:24.4036045Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4036483Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4037033Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4037496Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4038064Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4038497Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4039040Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4039492Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4039938Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4040412Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4041135Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4041818Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4042328Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4042775Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4043275Z [1673473518.786532] [7c5487d9c02b:30602:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4043776Z [1673473518.811129] [7c5487d9c02b:30602:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4044240Z [1673473518.811129] [7c5487d9c02b:30602:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4044731Z [1673473518.782948] [7c5487d9c02b:30601:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4045271Z [1673473518.807370] [7c5487d9c02b:30601:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4045741Z [1673473518.807370] [7c5487d9c02b:30601:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4046206Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4046666Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4047138Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4047607Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4048074Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4048525Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4048997Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4049460Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4049910Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4050373Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4050833Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4051298Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4051625Z ok (7.269s) 2023-01-11T22:10:24.4051770Z 2023-01-11T22:10:24.4052043Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4052369Z Ran 1 test in 7.269s 2023-01-11T22:10:24.4052528Z 2023-01-11T22:10:24.4052604Z OK 2023-01-11T22:10:24.4052737Z 2023-01-11T22:10:24.4052863Z Generating XML reports... 2023-01-11T22:10:24.4053461Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214513.xml 2023-01-11T22:10:24.4054158Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4054587Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4055153Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4055611Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4055836Z 2023-01-11T22:10:24.4055927Z Running tests... 2023-01-11T22:10:24.4056318Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4057277Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4057817Z test_DistributedDataParallel_SyncBatchNorm_2D_Input (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4058323Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30719 2023-01-11T22:10:24.4058761Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30720 2023-01-11T22:10:24.4059345Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4059754Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4060299Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4060745Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4061301Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4061811Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4062393Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4062848Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4063274Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4063767Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4064409Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4065086Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4065588Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4066050Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4066552Z [1673473528.433844] [7c5487d9c02b:30719:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4067053Z [1673473528.447166] [7c5487d9c02b:30719:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4067500Z [1673473528.447166] [7c5487d9c02b:30719:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4068001Z [1673473528.442776] [7c5487d9c02b:30720:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4068492Z [1673473528.456255] [7c5487d9c02b:30720:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4068953Z [1673473528.456255] [7c5487d9c02b:30720:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4069406Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4069883Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4070356Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4070827Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4071151Z ok (6.044s) 2023-01-11T22:10:24.4071297Z 2023-01-11T22:10:24.4071572Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4071895Z Ran 1 test in 6.044s 2023-01-11T22:10:24.4072055Z 2023-01-11T22:10:24.4072202Z OK 2023-01-11T22:10:24.4072339Z 2023-01-11T22:10:24.4072462Z Generating XML reports... 2023-01-11T22:10:24.4073067Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214523.xml 2023-01-11T22:10:24.4073769Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4074196Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4074761Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4075223Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4075449Z 2023-01-11T22:10:24.4075539Z Running tests... 2023-01-11T22:10:24.4075936Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4076457Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4077018Z test_DistributedDataParallel_SyncBatchNorm_Channels_Last (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4077591Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30837 2023-01-11T22:10:24.4078051Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30838 2023-01-11T22:10:24.4078650Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4079073Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4079636Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4080096Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4080664Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4081090Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4081652Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4082109Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4082556Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4083030Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4083674Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4084348Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4084843Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4085310Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4085815Z [1673473537.107545] [7c5487d9c02b:30838:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4086313Z [1673473537.120339] [7c5487d9c02b:30838:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4086759Z [1673473537.120339] [7c5487d9c02b:30838:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4087263Z [1673473537.100075] [7c5487d9c02b:30837:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4087753Z [1673473537.112897] [7c5487d9c02b:30837:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4088211Z [1673473537.112897] [7c5487d9c02b:30837:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4088723Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4089205Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4089679Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4090149Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4090598Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4091053Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4091512Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4091958Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4092306Z ok (6.238s) 2023-01-11T22:10:24.4092453Z 2023-01-11T22:10:24.4092726Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4093125Z Ran 1 test in 6.238s 2023-01-11T22:10:24.4093276Z 2023-01-11T22:10:24.4093369Z OK 2023-01-11T22:10:24.4093500Z 2023-01-11T22:10:24.4093623Z Generating XML reports... 2023-01-11T22:10:24.4094223Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214531.xml 2023-01-11T22:10:24.4094908Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4095351Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4095914Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4096376Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4096799Z 2023-01-11T22:10:24.4096902Z Running tests... 2023-01-11T22:10:24.4097311Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4097834Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4098397Z test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4098951Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30955 2023-01-11T22:10:24.4099402Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30956 2023-01-11T22:10:24.4100000Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4100422Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4100986Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4101449Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4102020Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4102443Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4102999Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4103455Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4103884Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4104377Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4105020Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4105809Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4106305Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4106768Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4107323Z [1673473545.908389] [7c5487d9c02b:30956:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4107828Z [1673473545.921853] [7c5487d9c02b:30956:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4108277Z [1673473545.921853] [7c5487d9c02b:30956:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4108781Z [1673473545.905975] [7c5487d9c02b:30955:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4109346Z [1673473545.919905] [7c5487d9c02b:30955:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4109822Z [1673473545.919905] [7c5487d9c02b:30955:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4110273Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4110748Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4111091Z ok (6.358s) 2023-01-11T22:10:24.4111237Z 2023-01-11T22:10:24.4111512Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4111816Z Ran 1 test in 6.358s 2023-01-11T22:10:24.4111975Z 2023-01-11T22:10:24.4112068Z OK 2023-01-11T22:10:24.4112200Z 2023-01-11T22:10:24.4112328Z Generating XML reports... 2023-01-11T22:10:24.4112908Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214540.xml 2023-01-11T22:10:24.4113613Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4114058Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4114618Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4115062Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4115286Z 2023-01-11T22:10:24.4115393Z Running tests... 2023-01-11T22:10:24.4115787Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4116287Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4116861Z test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4117418Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31073 2023-01-11T22:10:24.4117868Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31074 2023-01-11T22:10:24.4118447Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4118888Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4119453Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4119896Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4120466Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4120899Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4121528Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4121974Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4122418Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4122910Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4123557Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4124219Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4124733Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4125197Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4125742Z [1673473554.742486] [7c5487d9c02b:31073:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4126258Z [1673473554.756011] [7c5487d9c02b:31073:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4126727Z [1673473554.756011] [7c5487d9c02b:31073:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4127229Z [1673473554.746181] [7c5487d9c02b:31074:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4127713Z [1673473554.759540] [7c5487d9c02b:31074:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4128157Z [1673473554.759540] [7c5487d9c02b:31074:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4128629Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4129107Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4129557Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4130034Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4130374Z ok (6.863s) 2023-01-11T22:10:24.4130520Z 2023-01-11T22:10:24.4130789Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4131095Z Ran 1 test in 6.864s 2023-01-11T22:10:24.4131252Z 2023-01-11T22:10:24.4131345Z OK 2023-01-11T22:10:24.4131479Z 2023-01-11T22:10:24.4131601Z Generating XML reports... 2023-01-11T22:10:24.4132181Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214549.xml 2023-01-11T22:10:24.4132888Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4133333Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4133897Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4134343Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4134569Z 2023-01-11T22:10:24.4134679Z Running tests... 2023-01-11T22:10:24.4135075Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4135576Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4136127Z test_DistributedDataParallel_SyncBatchNorm_No_Affine (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4136848Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31191 2023-01-11T22:10:24.4137398Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31192 2023-01-11T22:10:24.4137994Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4138438Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4139002Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4139464Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4140016Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4140454Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4141008Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4141452Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4141962Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4142469Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4143119Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4143779Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4144295Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4144756Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4145264Z [1673473564.181518] [7c5487d9c02b:31191:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4145755Z [1673473564.195775] [7c5487d9c02b:31191:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4146220Z [1673473564.195775] [7c5487d9c02b:31191:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4146723Z [1673473564.181541] [7c5487d9c02b:31192:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4147213Z [1673473564.195789] [7c5487d9c02b:31192:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4147656Z [1673473564.195789] [7c5487d9c02b:31192:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4148123Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4148601Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4149057Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4149531Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4149874Z ok (6.726s) 2023-01-11T22:10:24.4150022Z 2023-01-11T22:10:24.4150291Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4150595Z Ran 1 test in 6.726s 2023-01-11T22:10:24.4150756Z 2023-01-11T22:10:24.4150849Z OK 2023-01-11T22:10:24.4150981Z 2023-01-11T22:10:24.4151104Z Generating XML reports... 2023-01-11T22:10:24.4151684Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214558.xml 2023-01-11T22:10:24.4152380Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4152888Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4153462Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4153911Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4154139Z 2023-01-11T22:10:24.4154246Z Running tests... 2023-01-11T22:10:24.4154645Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4155147Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4155723Z test_DistributedDataParallel_SyncBatchNorm_Single_Input_Per_Process (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4156276Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31309 2023-01-11T22:10:24.4156720Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31310 2023-01-11T22:10:24.4157313Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4157807Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4158381Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4158845Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4159397Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4159837Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4160390Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4160827Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4161275Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4161766Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4162415Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4163078Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4163589Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4164052Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4164560Z [1673473573.486010] [7c5487d9c02b:31310:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4165041Z [1673473573.499534] [7c5487d9c02b:31310:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4165509Z [1673473573.499534] [7c5487d9c02b:31310:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4166012Z [1673473573.484847] [7c5487d9c02b:31309:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4166500Z [1673473573.498854] [7c5487d9c02b:31309:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4166942Z [1673473573.498854] [7c5487d9c02b:31309:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4167405Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4167883Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4168424Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4168877Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4169220Z ok (6.139s) 2023-01-11T22:10:24.4169368Z 2023-01-11T22:10:24.4169639Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4169944Z Ran 1 test in 6.139s 2023-01-11T22:10:24.4170102Z 2023-01-11T22:10:24.4170194Z OK 2023-01-11T22:10:24.4170327Z 2023-01-11T22:10:24.4170450Z Generating XML reports... 2023-01-11T22:10:24.4171025Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214608.xml 2023-01-11T22:10:24.4171722Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4172168Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4172732Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4173180Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4173407Z 2023-01-11T22:10:24.4173583Z Running tests... 2023-01-11T22:10:24.4173992Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4174511Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4175033Z test_DistributedDataParallel_non_default_stream (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4176085Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/76428 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.625s) 2023-01-11T22:10:24.4176770Z 2023-01-11T22:10:24.4177047Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4177376Z Ran 1 test in 1.625s 2023-01-11T22:10:24.4177522Z 2023-01-11T22:10:24.4177628Z OK (skipped=1) 2023-01-11T22:10:24.4177783Z 2023-01-11T22:10:24.4177905Z Generating XML reports... 2023-01-11T22:10:24.4178500Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214616.xml 2023-01-11T22:10:24.4179180Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4179623Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4180185Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4180642Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4180867Z 2023-01-11T22:10:24.4180958Z Running tests... 2023-01-11T22:10:24.4181358Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4181877Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4182417Z test_DistributedDataParallel_requires_grad (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4182920Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31461 2023-01-11T22:10:24.4183364Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31462 2023-01-11T22:10:24.4183962Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4184385Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4184950Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4185413Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4186087Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4186510Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4187067Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4187522Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4187949Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4188439Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4189088Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4189765Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4190268Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4190799Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4191151Z ok (4.340s) 2023-01-11T22:10:24.4191300Z 2023-01-11T22:10:24.4191570Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4191880Z Ran 1 test in 4.340s 2023-01-11T22:10:24.4192040Z 2023-01-11T22:10:24.4192134Z OK 2023-01-11T22:10:24.4192267Z 2023-01-11T22:10:24.4192390Z Generating XML reports... 2023-01-11T22:10:24.4192972Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214620.xml 2023-01-11T22:10:24.4193669Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4194119Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4194685Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4195134Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4195360Z 2023-01-11T22:10:24.4195468Z Running tests... 2023-01-11T22:10:24.4195864Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4196364Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4196918Z test_DistributedDataParallel_with_amp_and_grad_is_view (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4197968Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77294 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.636s) 2023-01-11T22:10:24.4198481Z 2023-01-11T22:10:24.4198743Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4199050Z Ran 1 test in 1.636s 2023-01-11T22:10:24.4199211Z 2023-01-11T22:10:24.4199317Z OK (skipped=1) 2023-01-11T22:10:24.4199472Z 2023-01-11T22:10:24.4199596Z Generating XML reports... 2023-01-11T22:10:24.4200190Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214627.xml 2023-01-11T22:10:24.4200874Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4201317Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4201880Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4202340Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4202608Z 2023-01-11T22:10:24.4202716Z Running tests... 2023-01-11T22:10:24.4203119Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4203644Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4204145Z test_DistributedSampler_padding (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4204643Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31598 2023-01-11T22:10:24.4205089Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31599 2023-01-11T22:10:24.4205685Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4206109Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4206679Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4207140Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4207802Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4208252Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4208815Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4209272Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4209697Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4210193Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4210839Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4211522Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4212021Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4212483Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4212994Z [1673473597.425192] [7c5487d9c02b:31598:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4213478Z [1673473597.439141] [7c5487d9c02b:31598:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4213946Z [1673473597.439141] [7c5487d9c02b:31598:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4214450Z [1673473597.432868] [7c5487d9c02b:31599:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4214950Z [1673473597.445973] [7c5487d9c02b:31599:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4215390Z [1673473597.445973] [7c5487d9c02b:31599:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4215730Z ok (6.273s) 2023-01-11T22:10:24.4215878Z 2023-01-11T22:10:24.4216146Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4216469Z Ran 1 test in 6.273s 2023-01-11T22:10:24.4216895Z 2023-01-11T22:10:24.4216995Z OK 2023-01-11T22:10:24.4217132Z 2023-01-11T22:10:24.4217257Z Generating XML reports... 2023-01-11T22:10:24.4217867Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214632.xml 2023-01-11T22:10:24.4218553Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4219096Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4219672Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4220136Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4220363Z 2023-01-11T22:10:24.4220454Z Running tests... 2023-01-11T22:10:24.4220848Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4221367Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4221863Z test_SyncBatchNorm_process_group (__main__.TestDistBackendWithSpawn) ... skip: no torchvision (0.002s) 2023-01-11T22:10:24.4222129Z 2023-01-11T22:10:24.4222390Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4222712Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4222874Z 2023-01-11T22:10:24.4222980Z OK (skipped=1) 2023-01-11T22:10:24.4223131Z 2023-01-11T22:10:24.4223236Z Generating XML reports... 2023-01-11T22:10:24.4223891Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214640.xml 2023-01-11T22:10:24.4224612Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4225055Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4225603Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4226062Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4226288Z 2023-01-11T22:10:24.4226395Z Running tests... 2023-01-11T22:10:24.4226773Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4227288Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4227738Z test_accumulate_gradients_no_sync (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.4228233Z Runs _test_accumulate_gradients_no_sync using default inputs ... skip: get_future is only supported on mpi, nccl and gloo (0.002s) 2023-01-11T22:10:24.4228511Z 2023-01-11T22:10:24.4228774Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4229092Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4229251Z 2023-01-11T22:10:24.4229358Z OK (skipped=1) 2023-01-11T22:10:24.4229509Z 2023-01-11T22:10:24.4229613Z Generating XML reports... 2023-01-11T22:10:24.4230204Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214643.xml 2023-01-11T22:10:24.4230902Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4231344Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4231895Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4232359Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4232588Z 2023-01-11T22:10:24.4232695Z Running tests... 2023-01-11T22:10:24.4233073Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4233592Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4234062Z test_accumulate_gradients_no_sync_allreduce_hook (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.4234583Z Runs multiple iterations on _test_accumulate_gradients_no_sync ... skip: get_future is only supported on mpi, nccl and gloo (0.002s) 2023-01-11T22:10:24.4234883Z 2023-01-11T22:10:24.4235126Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4235511Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4235670Z 2023-01-11T22:10:24.4235776Z OK (skipped=1) 2023-01-11T22:10:24.4235927Z 2023-01-11T22:10:24.4236049Z Generating XML reports... 2023-01-11T22:10:24.4236633Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214645.xml 2023-01-11T22:10:24.4237334Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4237779Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4238328Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4238790Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4239017Z 2023-01-11T22:10:24.4239125Z Running tests... 2023-01-11T22:10:24.4239521Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4240022Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4240509Z test_accumulate_gradients_no_sync_allreduce_with_then_hook (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.4241111Z Runs multiple iterations on _test_accumulate_gradients_no_sync using allreduce ... skip: get_future is only supported on mpi, nccl and gloo (0.002s) 2023-01-11T22:10:24.4241440Z 2023-01-11T22:10:24.4241703Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4242006Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4242167Z 2023-01-11T22:10:24.4242275Z OK (skipped=1) 2023-01-11T22:10:24.4242429Z 2023-01-11T22:10:24.4242550Z Generating XML reports... 2023-01-11T22:10:24.4243126Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214648.xml 2023-01-11T22:10:24.4243823Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4244272Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4244837Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4245280Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4245505Z 2023-01-11T22:10:24.4245612Z Running tests... 2023-01-11T22:10:24.4246007Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4246507Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4246972Z test_accumulate_gradients_no_sync_grad_is_view (__main__.TestDistBackendWithSpawn) 2023-01-11T22:10:24.4247478Z Runs _test_accumulate_gradients_no_sync using default inputs ... skip: get_future is only supported on mpi, nccl and gloo (0.002s) 2023-01-11T22:10:24.4247771Z 2023-01-11T22:10:24.4248032Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4248339Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4248497Z 2023-01-11T22:10:24.4248603Z OK (skipped=1) 2023-01-11T22:10:24.4248755Z 2023-01-11T22:10:24.4248876Z Generating XML reports... 2023-01-11T22:10:24.4249455Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214650.xml 2023-01-11T22:10:24.4250157Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4250599Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4251165Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4251612Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4251837Z 2023-01-11T22:10:24.4251946Z Running tests... 2023-01-11T22:10:24.4252343Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4252913Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4253401Z test_all_gather (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4253879Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31877 2023-01-11T22:10:24.4254325Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31878 2023-01-11T22:10:24.4254905Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4255342Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4255903Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4256366Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4257179Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4257629Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4258266Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4258717Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4259165Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4259662Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4260315Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4260980Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4261500Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4262074Z STAGE:2023-01-11 21:46:56 31877:31877 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4262548Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4263165Z STAGE:2023-01-11 21:46:56 31878:31878 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4263678Z [1673473616.721676] [7c5487d9c02b:31878:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4264182Z [1673473618.375330] [7c5487d9c02b:31878:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4264630Z [1673473618.375330] [7c5487d9c02b:31878:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4265215Z STAGE:2023-01-11 21:46:58 31878:31878 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4265734Z [1673473616.701592] [7c5487d9c02b:31877:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4266231Z [1673473618.337161] [7c5487d9c02b:31877:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4266675Z [1673473618.337161] [7c5487d9c02b:31877:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4267249Z STAGE:2023-01-11 21:46:58 31877:31877 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4267833Z STAGE:2023-01-11 21:46:58 31877:31877 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4268423Z STAGE:2023-01-11 21:46:58 31878:31878 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4269065Z STAGE:2023-01-11 21:46:58 31877:31877 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4269628Z STAGE:2023-01-11 21:46:58 31878:31878 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4270198Z STAGE:2023-01-11 21:46:58 31877:31877 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4270767Z STAGE:2023-01-11 21:46:58 31878:31878 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4271323Z STAGE:2023-01-11 21:46:58 31877:31877 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4271912Z STAGE:2023-01-11 21:46:58 31878:31878 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4272257Z ok (6.621s) 2023-01-11T22:10:24.4272404Z 2023-01-11T22:10:24.4272665Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4272970Z Ran 1 test in 6.621s 2023-01-11T22:10:24.4273133Z 2023-01-11T22:10:24.4273224Z OK 2023-01-11T22:10:24.4273356Z 2023-01-11T22:10:24.4273478Z Generating XML reports... 2023-01-11T22:10:24.4274112Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214652.xml 2023-01-11T22:10:24.4274823Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4275270Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4275841Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4276288Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4276516Z 2023-01-11T22:10:24.4276624Z Running tests... 2023-01-11T22:10:24.4277022Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4277528Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4278064Z test_all_gather_coalesced_complex (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support all_gather_coalesced (0.002s) 2023-01-11T22:10:24.4278380Z 2023-01-11T22:10:24.4278645Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4278964Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4279105Z 2023-01-11T22:10:24.4279212Z OK (skipped=1) 2023-01-11T22:10:24.4279362Z 2023-01-11T22:10:24.4279484Z Generating XML reports... 2023-01-11T22:10:24.4280075Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214701.xml 2023-01-11T22:10:24.4280755Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4281195Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4281758Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4282222Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4282448Z 2023-01-11T22:10:24.4282539Z Running tests... 2023-01-11T22:10:24.4282937Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4283454Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4283966Z test_all_gather_coalesced_full_group (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support all_gather_coalesced (0.002s) 2023-01-11T22:10:24.4284282Z 2023-01-11T22:10:24.4284540Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4284859Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4285018Z 2023-01-11T22:10:24.4285123Z OK (skipped=1) 2023-01-11T22:10:24.4285273Z 2023-01-11T22:10:24.4285379Z Generating XML reports... 2023-01-11T22:10:24.4285967Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214704.xml 2023-01-11T22:10:24.4286737Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4287182Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4287727Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4288187Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4288412Z 2023-01-11T22:10:24.4288519Z Running tests... 2023-01-11T22:10:24.4288895Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4289411Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4289936Z test_all_gather_coalesced_group (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support all_gather_coalesced (0.002s) 2023-01-11T22:10:24.4290246Z 2023-01-11T22:10:24.4290505Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4290809Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4290967Z 2023-01-11T22:10:24.4291135Z OK (skipped=1) 2023-01-11T22:10:24.4291295Z 2023-01-11T22:10:24.4291420Z Generating XML reports... 2023-01-11T22:10:24.4291996Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214706.xml 2023-01-11T22:10:24.4292696Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4293137Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4293701Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4294145Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4294375Z 2023-01-11T22:10:24.4294483Z Running tests... 2023-01-11T22:10:24.4294880Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4295381Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4295909Z test_all_gather_coalesced_simple (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support all_gather_coalesced (0.002s) 2023-01-11T22:10:24.4296221Z 2023-01-11T22:10:24.4296480Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4297042Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4297206Z 2023-01-11T22:10:24.4297297Z OK (skipped=1) 2023-01-11T22:10:24.4297450Z 2023-01-11T22:10:24.4297573Z Generating XML reports... 2023-01-11T22:10:24.4298176Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214709.xml 2023-01-11T22:10:24.4298545Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4298723Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4299083Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4299272Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4299292Z 2023-01-11T22:10:24.4299399Z Running tests... 2023-01-11T22:10:24.4299660Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4299965Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4300252Z test_all_gather_coalesced_with_empty (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support all_gather_coalesced (0.003s) 2023-01-11T22:10:24.4300272Z 2023-01-11T22:10:24.4300530Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4300640Z Ran 1 test in 0.003s 2023-01-11T22:10:24.4300744Z 2023-01-11T22:10:24.4300858Z OK (skipped=1) 2023-01-11T22:10:24.4300877Z 2023-01-11T22:10:24.4300981Z Generating XML reports... 2023-01-11T22:10:24.4301428Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214711.xml 2023-01-11T22:10:24.4301790Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4301964Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4302338Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4302525Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4302544Z 2023-01-11T22:10:24.4302650Z Running tests... 2023-01-11T22:10:24.4302909Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4303195Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4303456Z test_all_gather_complex (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4303737Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32156 2023-01-11T22:10:24.4303958Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32157 2023-01-11T22:10:24.4304330Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4304504Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4304879Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4305070Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4305431Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4305590Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4305961Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4306145Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4306389Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4306630Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4307027Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4307465Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4307694Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4308033Z STAGE:2023-01-11 21:47:17 32156:32156 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4308244Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4308572Z STAGE:2023-01-11 21:47:17 32157:32157 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4308845Z [1673473637.749685] [7c5487d9c02b:32157:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4309077Z [1673473639.370029] [7c5487d9c02b:32157:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4309312Z [1673473639.370029] [7c5487d9c02b:32157:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4309577Z [1673473637.747497] [7c5487d9c02b:32156:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4309869Z [1673473639.388962] [7c5487d9c02b:32156:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4310105Z [1673473639.388962] [7c5487d9c02b:32156:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4310655Z STAGE:2023-01-11 21:47:19 32157:32157 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:47:19 32156:32156 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4310675Z 2023-01-11T22:10:24.4311022Z STAGE:2023-01-11 21:47:19 32156:32156 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4311362Z STAGE:2023-01-11 21:47:19 32157:32157 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4311669Z STAGE:2023-01-11 21:47:19 32157:32157 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4311988Z STAGE:2023-01-11 21:47:19 32156:32156 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4312362Z STAGE:2023-01-11 21:47:19 32157:32157 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4312700Z STAGE:2023-01-11 21:47:19 32156:32156 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4313040Z STAGE:2023-01-11 21:47:19 32157:32157 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4313383Z STAGE:2023-01-11 21:47:19 32156:32156 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4313484Z ok (6.519s) 2023-01-11T22:10:24.4313504Z 2023-01-11T22:10:24.4313765Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4313860Z Ran 1 test in 6.520s 2023-01-11T22:10:24.4313897Z 2023-01-11T22:10:24.4313971Z OK 2023-01-11T22:10:24.4313993Z 2023-01-11T22:10:24.4314115Z Generating XML reports... 2023-01-11T22:10:24.4314557Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214713.xml 2023-01-11T22:10:24.4314928Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4315102Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4315482Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4315671Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4315691Z 2023-01-11T22:10:24.4315797Z Running tests... 2023-01-11T22:10:24.4316041Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4316347Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4316609Z test_all_gather_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all gather (0.002s) 2023-01-11T22:10:24.4316629Z 2023-01-11T22:10:24.4316889Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4316999Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4317018Z 2023-01-11T22:10:24.4317124Z OK (skipped=1) 2023-01-11T22:10:24.4317143Z 2023-01-11T22:10:24.4317263Z Generating XML reports... 2023-01-11T22:10:24.4317707Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214722.xml 2023-01-11T22:10:24.4318075Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4318232Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4318605Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4318852Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4318871Z 2023-01-11T22:10:24.4318978Z Running tests... 2023-01-11T22:10:24.4319244Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4319550Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4319820Z test_all_gather_cuda_complex (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all gather (0.002s) 2023-01-11T22:10:24.4319839Z 2023-01-11T22:10:24.4320100Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4320193Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4320231Z 2023-01-11T22:10:24.4320319Z OK (skipped=1) 2023-01-11T22:10:24.4320337Z 2023-01-11T22:10:24.4320459Z Generating XML reports... 2023-01-11T22:10:24.4320899Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214725.xml 2023-01-11T22:10:24.4321270Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4321442Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4321859Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4322053Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4322073Z 2023-01-11T22:10:24.4322181Z Running tests... 2023-01-11T22:10:24.4322425Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4322728Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4322986Z test_all_gather_full_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4323203Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32336 2023-01-11T22:10:24.4323421Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32337 2023-01-11T22:10:24.4323788Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4323961Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4324334Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4324505Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4324865Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4325033Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4325398Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4325583Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4325827Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4326071Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4326468Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4326858Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4327069Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4327303Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:10:24.4327528Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4327766Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:10:24.4328239Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.4328572Z STAGE:2023-01-11 21:47:31 32336:32336 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4328962Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.4329289Z STAGE:2023-01-11 21:47:31 32337:32337 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4329564Z [1673473651.575195] [7c5487d9c02b:32337:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4329794Z [1673473653.188341] [7c5487d9c02b:32337:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4330016Z [1673473653.188341] [7c5487d9c02b:32337:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4330327Z [1673473651.554497] [7c5487d9c02b:32336:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4330559Z [1673473653.217300] [7c5487d9c02b:32336:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4330794Z [1673473653.217300] [7c5487d9c02b:32336:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4331338Z STAGE:2023-01-11 21:47:33 32337:32337 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:47:33 32336:32336 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4331358Z 2023-01-11T22:10:24.4331704Z STAGE:2023-01-11 21:47:33 32337:32337 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4332052Z STAGE:2023-01-11 21:47:33 32336:32336 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4332380Z STAGE:2023-01-11 21:47:33 32337:32337 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4332697Z STAGE:2023-01-11 21:47:33 32336:32336 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4333025Z STAGE:2023-01-11 21:47:33 32337:32337 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4333336Z STAGE:2023-01-11 21:47:33 32336:32336 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4333676Z STAGE:2023-01-11 21:47:33 32337:32337 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4334014Z STAGE:2023-01-11 21:47:33 32336:32336 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4334115Z ok (6.558s) 2023-01-11T22:10:24.4334135Z 2023-01-11T22:10:24.4334399Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4334511Z Ran 1 test in 6.558s 2023-01-11T22:10:24.4334530Z 2023-01-11T22:10:24.4334622Z OK 2023-01-11T22:10:24.4334641Z 2023-01-11T22:10:24.4334768Z Generating XML reports... 2023-01-11T22:10:24.4335212Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214727.xml 2023-01-11T22:10:24.4335561Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4335734Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4336109Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4336299Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4336318Z 2023-01-11T22:10:24.4336425Z Running tests... 2023-01-11T22:10:24.4336987Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4337305Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4337570Z test_all_gather_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4337771Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32450 2023-01-11T22:10:24.4337988Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32451 2023-01-11T22:10:24.4338355Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4338527Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4338903Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4339091Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4339453Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4339697Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4340088Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4340256Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4340499Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4340742Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4341140Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4341531Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4341761Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4341989Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4342145Z skip: Skipped due to small world size. (4.204s) 2023-01-11T22:10:24.4342165Z 2023-01-11T22:10:24.4342425Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4342518Z Ran 1 test in 4.205s 2023-01-11T22:10:24.4342538Z 2023-01-11T22:10:24.4342643Z OK (skipped=1) 2023-01-11T22:10:24.4342662Z 2023-01-11T22:10:24.4342785Z Generating XML reports... 2023-01-11T22:10:24.4343224Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214736.xml 2023-01-11T22:10:24.4343589Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4343766Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4344140Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4344331Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4344350Z 2023-01-11T22:10:24.4344439Z Running tests... 2023-01-11T22:10:24.4344698Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4345004Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4345295Z test_all_gather_into_cat_tensor_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_gather_into_tensor (0.002s) 2023-01-11T22:10:24.4345315Z 2023-01-11T22:10:24.4345573Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4345683Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4345703Z 2023-01-11T22:10:24.4345881Z OK (skipped=1) 2023-01-11T22:10:24.4345900Z 2023-01-11T22:10:24.4346022Z Generating XML reports... 2023-01-11T22:10:24.4346470Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214743.xml 2023-01-11T22:10:24.4346818Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4346991Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4347361Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4347551Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4347571Z 2023-01-11T22:10:24.4347677Z Running tests... 2023-01-11T22:10:24.4347938Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4348246Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4348544Z test_all_gather_into_stack_tensor_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_gather_into_tensor (0.002s) 2023-01-11T22:10:24.4348564Z 2023-01-11T22:10:24.4348866Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4348966Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4348985Z 2023-01-11T22:10:24.4349092Z OK (skipped=1) 2023-01-11T22:10:24.4349111Z 2023-01-11T22:10:24.4349234Z Generating XML reports... 2023-01-11T22:10:24.4349680Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214745.xml 2023-01-11T22:10:24.4350044Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4350215Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4350588Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4350780Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4350800Z 2023-01-11T22:10:24.4350906Z Running tests... 2023-01-11T22:10:24.4351150Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4351456Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4351736Z test_all_gather_multigpu (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl backend supports allgather multigpu (0.002s) 2023-01-11T22:10:24.4351755Z 2023-01-11T22:10:24.4352008Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4352118Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4352137Z 2023-01-11T22:10:24.4352243Z OK (skipped=1) 2023-01-11T22:10:24.4352263Z 2023-01-11T22:10:24.4352383Z Generating XML reports... 2023-01-11T22:10:24.4352821Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214748.xml 2023-01-11T22:10:24.4353169Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4353346Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4353716Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4353904Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4353923Z 2023-01-11T22:10:24.4354029Z Running tests... 2023-01-11T22:10:24.4354289Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4354594Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4354886Z test_all_gather_multigpu_complex (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl backend supports allgather multigpu (0.002s) 2023-01-11T22:10:24.4354958Z 2023-01-11T22:10:24.4355224Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4355317Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4355336Z 2023-01-11T22:10:24.4355447Z OK (skipped=1) 2023-01-11T22:10:24.4355466Z 2023-01-11T22:10:24.4355590Z Generating XML reports... 2023-01-11T22:10:24.4356028Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214750.xml 2023-01-11T22:10:24.4356390Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4356564Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4356934Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4357121Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4357140Z 2023-01-11T22:10:24.4357251Z Running tests... 2023-01-11T22:10:24.4357492Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4357855Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4358134Z test_all_gather_object_default_pg (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4358354Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32685 2023-01-11T22:10:24.4358567Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32686 2023-01-11T22:10:24.4358934Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4359109Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4359479Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4359654Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4360011Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4360185Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4360555Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4360741Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4360983Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4361222Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4361616Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4362007Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4362221Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4362446Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4362718Z [1673473676.919157] [7c5487d9c02b:32685:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4362947Z [1673473678.357422] [7c5487d9c02b:32685:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4363182Z [1673473678.357422] [7c5487d9c02b:32685:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4363451Z [1673473676.939909] [7c5487d9c02b:32686:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4363735Z [1673473678.346395] [7c5487d9c02b:32686:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4363973Z [1673473678.346395] [7c5487d9c02b:32686:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4364075Z ok (7.244s) 2023-01-11T22:10:24.4364096Z 2023-01-11T22:10:24.4364361Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4364456Z Ran 1 test in 7.244s 2023-01-11T22:10:24.4364475Z 2023-01-11T22:10:24.4364566Z OK 2023-01-11T22:10:24.4364586Z 2023-01-11T22:10:24.4364710Z Generating XML reports... 2023-01-11T22:10:24.4365148Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214753.xml 2023-01-11T22:10:24.4365510Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4365687Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4366062Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4366297Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4366318Z 2023-01-11T22:10:24.4366413Z Running tests... 2023-01-11T22:10:24.4366676Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4366984Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4367253Z test_all_gather_object_subgroup (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4367472Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32796 2023-01-11T22:10:24.4367687Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32797 2023-01-11T22:10:24.4368047Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4368225Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4368599Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4368770Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4369129Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4369299Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4369667Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4369854Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4370095Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4370340Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4370736Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4371111Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4371338Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4371564Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4371801Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:10:24.4372041Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:10:24.4372434Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.4372887Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.4373128Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:10:24.4373362Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:10:24.4373744Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:10:24.4374115Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:10:24.4374352Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:10:24.4374584Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:10:24.4374971Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:10:24.4375399Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:10:24.4375678Z [1673473686.658937] [7c5487d9c02b:32796:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4375911Z [1673473688.086160] [7c5487d9c02b:32796:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4376147Z [1673473688.086160] [7c5487d9c02b:32796:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4376416Z [1673473686.679648] [7c5487d9c02b:32797:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4376873Z [1673473688.060788] [7c5487d9c02b:32797:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4377098Z [1673473688.060788] [7c5487d9c02b:32797:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4377203Z ok (7.613s) 2023-01-11T22:10:24.4377225Z 2023-01-11T22:10:24.4377501Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4377613Z Ran 1 test in 7.614s 2023-01-11T22:10:24.4377632Z 2023-01-11T22:10:24.4377725Z OK 2023-01-11T22:10:24.4377744Z 2023-01-11T22:10:24.4377867Z Generating XML reports... 2023-01-11T22:10:24.4378312Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214802.xml 2023-01-11T22:10:24.4378677Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4378837Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4379214Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4379406Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4379425Z 2023-01-11T22:10:24.4379534Z Running tests... 2023-01-11T22:10:24.4379793Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4380100Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4380355Z test_all_gather_v_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports all_gather_v (0.002s) 2023-01-11T22:10:24.4380375Z 2023-01-11T22:10:24.4380632Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4380742Z Ran 1 test in 0.003s 2023-01-11T22:10:24.4380762Z 2023-01-11T22:10:24.4380851Z OK (skipped=1) 2023-01-11T22:10:24.4380948Z 2023-01-11T22:10:24.4381079Z Generating XML reports... 2023-01-11T22:10:24.4381528Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214812.xml 2023-01-11T22:10:24.4381896Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4382068Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4382442Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4382630Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4382649Z 2023-01-11T22:10:24.4382756Z Running tests... 2023-01-11T22:10:24.4383013Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4383305Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4383721Z test_all_reduce_coalesced_full_group_max (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:10:24.4383741Z 2023-01-11T22:10:24.4384055Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4384174Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4384194Z 2023-01-11T22:10:24.4384301Z OK (skipped=1) 2023-01-11T22:10:24.4384320Z 2023-01-11T22:10:24.4384442Z Generating XML reports... 2023-01-11T22:10:24.4384883Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214815.xml 2023-01-11T22:10:24.4385248Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4385421Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4385777Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4385970Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4385989Z 2023-01-11T22:10:24.4386096Z Running tests... 2023-01-11T22:10:24.4386358Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4386664Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4387074Z test_all_reduce_coalesced_full_group_min (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:10:24.4387094Z 2023-01-11T22:10:24.4387347Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4387456Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4387476Z 2023-01-11T22:10:24.4387565Z OK (skipped=1) 2023-01-11T22:10:24.4387604Z 2023-01-11T22:10:24.4387709Z Generating XML reports... 2023-01-11T22:10:24.4388146Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214817.xml 2023-01-11T22:10:24.4388509Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4388685Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4389058Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4389244Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4389263Z 2023-01-11T22:10:24.4389370Z Running tests... 2023-01-11T22:10:24.4389628Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4389914Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4390335Z test_all_reduce_coalesced_full_group_product (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:10:24.4390405Z 2023-01-11T22:10:24.4390674Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4390783Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4390802Z 2023-01-11T22:10:24.4390908Z OK (skipped=1) 2023-01-11T22:10:24.4390927Z 2023-01-11T22:10:24.4391048Z Generating XML reports... 2023-01-11T22:10:24.4391483Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214820.xml 2023-01-11T22:10:24.4391847Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4392019Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4392374Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4392562Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4392584Z 2023-01-11T22:10:24.4392690Z Running tests... 2023-01-11T22:10:24.4392947Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4393300Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4393722Z test_all_reduce_coalesced_full_group_sum (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:10:24.4393742Z 2023-01-11T22:10:24.4393994Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4394103Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4394122Z 2023-01-11T22:10:24.4394228Z OK (skipped=1) 2023-01-11T22:10:24.4394247Z 2023-01-11T22:10:24.4394351Z Generating XML reports... 2023-01-11T22:10:24.4394784Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214822.xml 2023-01-11T22:10:24.4395148Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4395325Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4395699Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4395887Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4395907Z 2023-01-11T22:10:24.4396013Z Running tests... 2023-01-11T22:10:24.4396270Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4396559Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4396963Z test_all_reduce_coalesced_group_max (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:10:24.4396982Z 2023-01-11T22:10:24.4397238Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4397348Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4397370Z 2023-01-11T22:10:24.4397477Z OK (skipped=1) 2023-01-11T22:10:24.4397496Z 2023-01-11T22:10:24.4397617Z Generating XML reports... 2023-01-11T22:10:24.4398059Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214824.xml 2023-01-11T22:10:24.4398420Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4398591Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4398946Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4399130Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4399150Z 2023-01-11T22:10:24.4399256Z Running tests... 2023-01-11T22:10:24.4399515Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4399882Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4400287Z test_all_reduce_coalesced_group_min (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:10:24.4400308Z 2023-01-11T22:10:24.4400565Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4400675Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4400694Z 2023-01-11T22:10:24.4400800Z OK (skipped=1) 2023-01-11T22:10:24.4400820Z 2023-01-11T22:10:24.4400924Z Generating XML reports... 2023-01-11T22:10:24.4401360Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214827.xml 2023-01-11T22:10:24.4401722Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4401894Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4402265Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4402457Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4402476Z 2023-01-11T22:10:24.4402631Z Running tests... 2023-01-11T22:10:24.4402900Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4403206Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4403602Z test_all_reduce_coalesced_group_product (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:10:24.4403622Z 2023-01-11T22:10:24.4403876Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4403985Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4404004Z 2023-01-11T22:10:24.4404112Z OK (skipped=1) 2023-01-11T22:10:24.4404130Z 2023-01-11T22:10:24.4404252Z Generating XML reports... 2023-01-11T22:10:24.4404693Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214829.xml 2023-01-11T22:10:24.4405060Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4405231Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4405603Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4405773Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4405792Z 2023-01-11T22:10:24.4405898Z Running tests... 2023-01-11T22:10:24.4406156Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4406461Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4406865Z test_all_reduce_coalesced_group_sum (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:10:24.4406888Z 2023-01-11T22:10:24.4407146Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4407306Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4407326Z 2023-01-11T22:10:24.4407438Z OK (skipped=1) 2023-01-11T22:10:24.4407457Z 2023-01-11T22:10:24.4407562Z Generating XML reports... 2023-01-11T22:10:24.4408003Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214832.xml 2023-01-11T22:10:24.4408367Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4408540Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4408913Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4409102Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4409178Z 2023-01-11T22:10:24.4409290Z Running tests... 2023-01-11T22:10:24.4409551Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4409860Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4410233Z test_all_reduce_coalesced_max (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:10:24.4410270Z 2023-01-11T22:10:24.4410508Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4410619Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4410638Z 2023-01-11T22:10:24.4410744Z OK (skipped=1) 2023-01-11T22:10:24.4410762Z 2023-01-11T22:10:24.4410884Z Generating XML reports... 2023-01-11T22:10:24.4411318Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214834.xml 2023-01-11T22:10:24.4411680Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4411855Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4412273Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4412450Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4412469Z 2023-01-11T22:10:24.4412576Z Running tests... 2023-01-11T22:10:24.4412836Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4413139Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4413431Z test_all_reduce_coalesced_max_complex_unsupported (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4413649Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33258 2023-01-11T22:10:24.4413868Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33259 2023-01-11T22:10:24.4414234Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4414394Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4414766Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4414953Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4415312Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4415480Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4415850Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4416035Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4416279Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4416524Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4417319Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4417715Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4417945Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4418171Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4418903Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1714: UserWarning: torch.distributed.all_reduce_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:10:24.4419108Z warnings.warn( 2023-01-11T22:10:24.4419842Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1714: UserWarning: torch.distributed.all_reduce_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:10:24.4419952Z warnings.warn( 2023-01-11T22:10:24.4420052Z ok (4.185s) 2023-01-11T22:10:24.4420072Z 2023-01-11T22:10:24.4420334Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4420426Z Ran 1 test in 4.185s 2023-01-11T22:10:24.4420446Z 2023-01-11T22:10:24.4420538Z OK 2023-01-11T22:10:24.4420558Z 2023-01-11T22:10:24.4420681Z Generating XML reports... 2023-01-11T22:10:24.4421120Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214836.xml 2023-01-11T22:10:24.4421486Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4421730Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4422117Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4422307Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4422327Z 2023-01-11T22:10:24.4422417Z Running tests... 2023-01-11T22:10:24.4422677Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4422982Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4423370Z test_all_reduce_coalesced_min (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:10:24.4423393Z 2023-01-11T22:10:24.4423650Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4423760Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4423780Z 2023-01-11T22:10:24.4423886Z OK (skipped=1) 2023-01-11T22:10:24.4423908Z 2023-01-11T22:10:24.4424029Z Generating XML reports... 2023-01-11T22:10:24.4424467Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214843.xml 2023-01-11T22:10:24.4424812Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4424983Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4425350Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4425538Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4425557Z 2023-01-11T22:10:24.4425663Z Running tests... 2023-01-11T22:10:24.4425923Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4426226Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4426630Z test_all_reduce_coalesced_product (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:10:24.4426650Z 2023-01-11T22:10:24.4426908Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4427000Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4427019Z 2023-01-11T22:10:24.4427124Z OK (skipped=1) 2023-01-11T22:10:24.4427143Z 2023-01-11T22:10:24.4427264Z Generating XML reports... 2023-01-11T22:10:24.4427705Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214846.xml 2023-01-11T22:10:24.4428068Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4428297Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4428679Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4428871Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4428891Z 2023-01-11T22:10:24.4428998Z Running tests... 2023-01-11T22:10:24.4429239Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4429547Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4429938Z test_all_reduce_coalesced_sum (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:10:24.4429958Z 2023-01-11T22:10:24.4430214Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4430325Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4430344Z 2023-01-11T22:10:24.4430453Z OK (skipped=1) 2023-01-11T22:10:24.4430472Z 2023-01-11T22:10:24.4430593Z Generating XML reports... 2023-01-11T22:10:24.4431073Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214848.xml 2023-01-11T22:10:24.4431426Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4431602Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4431971Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4432159Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4432178Z 2023-01-11T22:10:24.4432286Z Running tests... 2023-01-11T22:10:24.4432542Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4432849Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4433134Z test_all_reduce_complex_unsupported_ops (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4433355Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33460 2023-01-11T22:10:24.4433553Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33461 2023-01-11T22:10:24.4433920Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4434093Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4434466Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4434655Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4435014Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4435190Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4435564Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4435733Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4435977Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4436218Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4436615Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4437005Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4437230Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4437510Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4437611Z ok (4.237s) 2023-01-11T22:10:24.4437631Z 2023-01-11T22:10:24.4437896Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4437991Z Ran 1 test in 4.237s 2023-01-11T22:10:24.4438026Z 2023-01-11T22:10:24.4438101Z OK 2023-01-11T22:10:24.4438119Z 2023-01-11T22:10:24.4438242Z Generating XML reports... 2023-01-11T22:10:24.4438681Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214850.xml 2023-01-11T22:10:24.4439046Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4439222Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4439594Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4439786Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4439805Z 2023-01-11T22:10:24.4439912Z Running tests... 2023-01-11T22:10:24.4440200Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4440519Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4440785Z test_all_reduce_full_group_max (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4441003Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33563 2023-01-11T22:10:24.4441218Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33564 2023-01-11T22:10:24.4441581Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4441753Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4442126Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4442297Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4442659Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4442833Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4443200Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4443383Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4443626Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4443866Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4444262Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4444660Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4444869Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4445105Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:10:24.4445326Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4445557Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:10:24.4445949Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.4446333Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.4446728Z STAGE:2023-01-11 21:49:01 33564:33564 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4447052Z STAGE:2023-01-11 21:49:01 33563:33563 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4447326Z [1673473741.619172] [7c5487d9c02b:33563:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4447539Z [1673473743.263217] [7c5487d9c02b:33563:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4447775Z [1673473743.263217] [7c5487d9c02b:33563:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4448043Z [1673473741.639329] [7c5487d9c02b:33564:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4448274Z [1673473743.299667] [7c5487d9c02b:33564:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4448552Z [1673473743.299667] [7c5487d9c02b:33564:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4449104Z STAGE:2023-01-11 21:49:03 33563:33563 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:49:03 33564:33564 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4449124Z 2023-01-11T22:10:24.4449468Z STAGE:2023-01-11 21:49:03 33563:33563 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4449808Z STAGE:2023-01-11 21:49:03 33564:33564 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4450130Z STAGE:2023-01-11 21:49:03 33563:33563 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4450445Z STAGE:2023-01-11 21:49:03 33564:33564 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4450780Z STAGE:2023-01-11 21:49:03 33563:33563 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4451312Z STAGE:2023-01-11 21:49:03 33563:33563 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 21:49:03 33564:33564 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4451350Z 2023-01-11T22:10:24.4451672Z STAGE:2023-01-11 21:49:03 33564:33564 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4451773Z ok (6.667s) 2023-01-11T22:10:24.4451791Z 2023-01-11T22:10:24.4452055Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4452166Z Ran 1 test in 6.668s 2023-01-11T22:10:24.4452186Z 2023-01-11T22:10:24.4452279Z OK 2023-01-11T22:10:24.4452298Z 2023-01-11T22:10:24.4452425Z Generating XML reports... 2023-01-11T22:10:24.4452864Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214857.xml 2023-01-11T22:10:24.4453236Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4453393Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4453767Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4453956Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4453975Z 2023-01-11T22:10:24.4454083Z Running tests... 2023-01-11T22:10:24.4454343Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4454650Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4454914Z test_all_reduce_full_group_min (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4455192Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33677 2023-01-11T22:10:24.4455408Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33678 2023-01-11T22:10:24.4455762Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4455935Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4456310Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4456497Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4457111Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4457283Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4457652Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4457843Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4458148Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4458402Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4458799Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4459188Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4459415Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4459652Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:10:24.4459877Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4460118Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:10:24.4460514Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.4460885Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.4461221Z STAGE:2023-01-11 21:49:10 33678:33678 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4461547Z STAGE:2023-01-11 21:49:10 33677:33677 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4461820Z [1673473750.847054] [7c5487d9c02b:33677:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4462052Z [1673473752.497862] [7c5487d9c02b:33677:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4462293Z [1673473752.497862] [7c5487d9c02b:33677:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4462561Z [1673473750.867718] [7c5487d9c02b:33678:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4462788Z [1673473752.503760] [7c5487d9c02b:33678:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4463021Z [1673473752.503760] [7c5487d9c02b:33678:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4463566Z STAGE:2023-01-11 21:49:12 33677:33677 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:49:12 33678:33678 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4463649Z 2023-01-11T22:10:24.4464005Z STAGE:2023-01-11 21:49:12 33678:33678 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4464334Z STAGE:2023-01-11 21:49:12 33677:33677 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4464658Z STAGE:2023-01-11 21:49:13 33678:33678 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4464973Z STAGE:2023-01-11 21:49:13 33677:33677 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4465300Z STAGE:2023-01-11 21:49:13 33678:33678 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4465634Z STAGE:2023-01-11 21:49:13 33677:33677 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4465975Z STAGE:2023-01-11 21:49:13 33678:33678 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4466319Z STAGE:2023-01-11 21:49:13 33677:33677 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4466424Z ok (6.655s) 2023-01-11T22:10:24.4466443Z 2023-01-11T22:10:24.4466703Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4466876Z Ran 1 test in 6.655s 2023-01-11T22:10:24.4466897Z 2023-01-11T22:10:24.4466994Z OK 2023-01-11T22:10:24.4467013Z 2023-01-11T22:10:24.4467138Z Generating XML reports... 2023-01-11T22:10:24.4467584Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214906.xml 2023-01-11T22:10:24.4467947Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4468124Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4468497Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4468685Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4468709Z 2023-01-11T22:10:24.4468799Z Running tests... 2023-01-11T22:10:24.4469056Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4469363Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4469629Z test_all_reduce_full_group_product (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4469838Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33791 2023-01-11T22:10:24.4470046Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33792 2023-01-11T22:10:24.4470407Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4470571Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4470943Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4471117Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4471473Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4471640Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4472005Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4472182Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4472418Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4472653Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4473041Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4473485Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4473696Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4473915Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4474143Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:10:24.4474373Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:10:24.4474760Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.4475144Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.4475471Z STAGE:2023-01-11 21:49:19 33792:33792 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4475787Z STAGE:2023-01-11 21:49:19 33791:33791 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4476103Z [1673473759.906614] [7c5487d9c02b:33792:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4476323Z [1673473761.531834] [7c5487d9c02b:33792:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4476555Z [1673473761.531834] [7c5487d9c02b:33792:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4476818Z [1673473759.885770] [7c5487d9c02b:33791:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4477037Z [1673473761.574051] [7c5487d9c02b:33791:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4477268Z [1673473761.574051] [7c5487d9c02b:33791:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4477820Z STAGE:2023-01-11 21:49:21 33792:33792 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:49:21 33791:33791 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4477841Z 2023-01-11T22:10:24.4478178Z STAGE:2023-01-11 21:49:21 33792:33792 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4478514Z STAGE:2023-01-11 21:49:21 33791:33791 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4478829Z STAGE:2023-01-11 21:49:22 33791:33791 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4479145Z STAGE:2023-01-11 21:49:22 33792:33792 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4479460Z STAGE:2023-01-11 21:49:22 33791:33791 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4479779Z STAGE:2023-01-11 21:49:22 33792:33792 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4480116Z STAGE:2023-01-11 21:49:22 33791:33791 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4480451Z STAGE:2023-01-11 21:49:22 33792:33792 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4480545Z ok (6.619s) 2023-01-11T22:10:24.4480564Z 2023-01-11T22:10:24.4480823Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4480924Z Ran 1 test in 6.620s 2023-01-11T22:10:24.4480944Z 2023-01-11T22:10:24.4481029Z OK 2023-01-11T22:10:24.4481048Z 2023-01-11T22:10:24.4481162Z Generating XML reports... 2023-01-11T22:10:24.4481589Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214916.xml 2023-01-11T22:10:24.4482008Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4482177Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4482547Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4482726Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4482745Z 2023-01-11T22:10:24.4482844Z Running tests... 2023-01-11T22:10:24.4483095Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4483394Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4483643Z test_all_reduce_full_group_sum (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4483853Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33905 2023-01-11T22:10:24.4484062Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33906 2023-01-11T22:10:24.4484473Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4484641Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4485008Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4485188Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4485533Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4485693Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4486046Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4486228Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4486459Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4486695Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4487083Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4487463Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4487680Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4487909Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:10:24.4488120Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4488339Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:10:24.4488724Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.4489047Z STAGE:2023-01-11 21:49:29 33905:33905 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4489424Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.4489741Z STAGE:2023-01-11 21:49:29 33906:33906 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4490003Z [1673473769.120608] [7c5487d9c02b:33905:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4490222Z [1673473770.743252] [7c5487d9c02b:33905:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4490504Z [1673473770.743252] [7c5487d9c02b:33905:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4490767Z [1673473769.140581] [7c5487d9c02b:33906:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4490983Z [1673473770.763548] [7c5487d9c02b:33906:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4491198Z [1673473770.763548] [7c5487d9c02b:33906:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4491742Z STAGE:2023-01-11 21:49:31 33905:33905 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:49:31 33906:33906 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4491763Z 2023-01-11T22:10:24.4492102Z STAGE:2023-01-11 21:49:31 33905:33905 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4492443Z STAGE:2023-01-11 21:49:31 33906:33906 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4492805Z STAGE:2023-01-11 21:49:31 33905:33905 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4493127Z STAGE:2023-01-11 21:49:31 33906:33906 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4493452Z STAGE:2023-01-11 21:49:31 33905:33905 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4493786Z STAGE:2023-01-11 21:49:31 33905:33905 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4494103Z STAGE:2023-01-11 21:49:31 33906:33906 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4494431Z STAGE:2023-01-11 21:49:31 33906:33906 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4494515Z ok (6.523s) 2023-01-11T22:10:24.4494538Z 2023-01-11T22:10:24.4494794Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4494897Z Ran 1 test in 6.523s 2023-01-11T22:10:24.4494917Z 2023-01-11T22:10:24.4495000Z OK 2023-01-11T22:10:24.4495022Z 2023-01-11T22:10:24.4495135Z Generating XML reports... 2023-01-11T22:10:24.4495570Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214925.xml 2023-01-11T22:10:24.4495927Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4496092Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4496450Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4496866Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4496889Z 2023-01-11T22:10:24.4496996Z Running tests... 2023-01-11T22:10:24.4497265Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4497569Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4497823Z test_all_reduce_group_max (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4498037Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34019 2023-01-11T22:10:24.4498246Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34020 2023-01-11T22:10:24.4498606Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4498762Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4499135Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4499322Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4499770Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4499937Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4500299Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4500475Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4500709Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4500930Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4501314Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4501695Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4501917Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4502191Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4502350Z skip: Skipped due to small world size. (4.339s) 2023-01-11T22:10:24.4502370Z 2023-01-11T22:10:24.4502628Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4502730Z Ran 1 test in 4.339s 2023-01-11T22:10:24.4502749Z 2023-01-11T22:10:24.4502847Z OK (skipped=1) 2023-01-11T22:10:24.4502867Z 2023-01-11T22:10:24.4502969Z Generating XML reports... 2023-01-11T22:10:24.4503400Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214934.xml 2023-01-11T22:10:24.4503755Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4503925Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4504294Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4504471Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4504490Z 2023-01-11T22:10:24.4504590Z Running tests... 2023-01-11T22:10:24.4504841Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4505137Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4505378Z test_all_reduce_group_min (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4505587Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34122 2023-01-11T22:10:24.4505788Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34123 2023-01-11T22:10:24.4506143Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4506307Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4506672Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4506849Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4507193Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4507393Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4507758Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4507932Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4508165Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4508457Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4508848Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4509231Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4509450Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4509665Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4509803Z skip: Skipped due to small world size. (4.207s) 2023-01-11T22:10:24.4509823Z 2023-01-11T22:10:24.4510073Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4510178Z Ran 1 test in 4.207s 2023-01-11T22:10:24.4510197Z 2023-01-11T22:10:24.4510299Z OK (skipped=1) 2023-01-11T22:10:24.4510318Z 2023-01-11T22:10:24.4510430Z Generating XML reports... 2023-01-11T22:10:24.4510907Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214941.xml 2023-01-11T22:10:24.4511274Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4511440Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4511803Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4511974Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4511993Z 2023-01-11T22:10:24.4512092Z Running tests... 2023-01-11T22:10:24.4512342Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4512637Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4512894Z test_all_reduce_group_product (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4513104Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34225 2023-01-11T22:10:24.4513309Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34226 2023-01-11T22:10:24.4513665Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4513821Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4514182Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4514359Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4514706Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4514868Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4515229Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4515403Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4515634Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4515856Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4516239Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4516619Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4516837Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4517107Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4517252Z skip: Skipped due to small world size. (4.364s) 2023-01-11T22:10:24.4517272Z 2023-01-11T22:10:24.4517532Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4517633Z Ran 1 test in 4.364s 2023-01-11T22:10:24.4517653Z 2023-01-11T22:10:24.4517749Z OK (skipped=1) 2023-01-11T22:10:24.4517768Z 2023-01-11T22:10:24.4517872Z Generating XML reports... 2023-01-11T22:10:24.4518304Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214947.xml 2023-01-11T22:10:24.4518662Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4518828Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4519194Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4519377Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4519398Z 2023-01-11T22:10:24.4519497Z Running tests... 2023-01-11T22:10:24.4519792Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4520103Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4520345Z test_all_reduce_group_sum (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4520554Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34328 2023-01-11T22:10:24.4520760Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34329 2023-01-11T22:10:24.4521118Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4521284Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4521650Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4521832Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4522184Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4522339Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4522707Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4522889Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4523122Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4523355Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4523741Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4524123Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4524341Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4524555Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4524694Z skip: Skipped due to small world size. (4.200s) 2023-01-11T22:10:24.4524720Z 2023-01-11T22:10:24.4524968Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4525069Z Ran 1 test in 4.201s 2023-01-11T22:10:24.4525088Z 2023-01-11T22:10:24.4525186Z OK (skipped=1) 2023-01-11T22:10:24.4525205Z 2023-01-11T22:10:24.4525320Z Generating XML reports... 2023-01-11T22:10:24.4525754Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214954.xml 2023-01-11T22:10:24.4526173Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4526343Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4526706Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4526877Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4526897Z 2023-01-11T22:10:24.4526999Z Running tests... 2023-01-11T22:10:24.4527249Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4527547Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4527786Z test_all_reduce_max (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4527998Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34431 2023-01-11T22:10:24.4528209Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34432 2023-01-11T22:10:24.4528613Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4528777Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4529147Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4529325Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4529675Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4529840Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4530201Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4530383Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4530622Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4530849Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4531226Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4531606Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4531822Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4532148Z STAGE:2023-01-11 21:50:05 34431:34431 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4532365Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4532692Z STAGE:2023-01-11 21:50:05 34432:34432 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4532960Z [1673473805.514916] [7c5487d9c02b:34431:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4533182Z [1673473807.174154] [7c5487d9c02b:34431:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4533409Z [1673473807.174154] [7c5487d9c02b:34431:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4533659Z [1673473805.517437] [7c5487d9c02b:34432:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4533882Z [1673473807.186717] [7c5487d9c02b:34432:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4534169Z [1673473807.186717] [7c5487d9c02b:34432:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4534714Z STAGE:2023-01-11 21:50:07 34431:34431 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:50:07 34432:34432 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4534735Z 2023-01-11T22:10:24.4535297Z STAGE:2023-01-11 21:50:07 34431:34431 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 21:50:07 34432:34432 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4535317Z 2023-01-11T22:10:24.4535630Z STAGE:2023-01-11 21:50:07 34432:34432 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4535940Z STAGE:2023-01-11 21:50:07 34431:34431 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4536261Z STAGE:2023-01-11 21:50:07 34432:34432 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4536805Z STAGE:2023-01-11 21:50:07 34431:34431 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4537230Z STAGE:2023-01-11 21:50:07 34432:34432 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4537586Z STAGE:2023-01-11 21:50:07 34431:34431 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4537670Z ok (6.665s) 2023-01-11T22:10:24.4537700Z 2023-01-11T22:10:24.4537945Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4538047Z Ran 1 test in 6.665s 2023-01-11T22:10:24.4538067Z 2023-01-11T22:10:24.4538151Z OK 2023-01-11T22:10:24.4538170Z 2023-01-11T22:10:24.4538283Z Generating XML reports... 2023-01-11T22:10:24.4538720Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215001.xml 2023-01-11T22:10:24.4539082Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4539249Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4539619Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4539792Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4539812Z 2023-01-11T22:10:24.4539912Z Running tests... 2023-01-11T22:10:24.4540167Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4540467Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4540707Z test_all_reduce_min (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4540918Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34545 2023-01-11T22:10:24.4541128Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34546 2023-01-11T22:10:24.4541485Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4541645Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4542013Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4542192Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4542544Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4542708Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4543067Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4543245Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4543555Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4543795Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4544177Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4544563Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4544780Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4545111Z STAGE:2023-01-11 21:50:14 34546:34546 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4545325Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4545638Z STAGE:2023-01-11 21:50:14 34545:34545 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4545968Z [1673473814.677125] [7c5487d9c02b:34546:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4546196Z [1673473816.298184] [7c5487d9c02b:34546:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4546428Z [1673473816.298184] [7c5487d9c02b:34546:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4546677Z [1673473814.656382] [7c5487d9c02b:34545:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4546894Z [1673473816.319442] [7c5487d9c02b:34545:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4547118Z [1673473816.319442] [7c5487d9c02b:34545:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4547674Z STAGE:2023-01-11 21:50:16 34546:34546 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:50:16 34545:34545 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4547695Z 2023-01-11T22:10:24.4548031Z STAGE:2023-01-11 21:50:16 34546:34546 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4548363Z STAGE:2023-01-11 21:50:16 34545:34545 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4548677Z STAGE:2023-01-11 21:50:16 34546:34546 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4548990Z STAGE:2023-01-11 21:50:16 34545:34545 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4549313Z STAGE:2023-01-11 21:50:16 34546:34546 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4549629Z STAGE:2023-01-11 21:50:16 34545:34545 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4549963Z STAGE:2023-01-11 21:50:16 34546:34546 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4550290Z STAGE:2023-01-11 21:50:16 34545:34545 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4550386Z ok (6.663s) 2023-01-11T22:10:24.4550405Z 2023-01-11T22:10:24.4550657Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4550764Z Ran 1 test in 6.663s 2023-01-11T22:10:24.4550783Z 2023-01-11T22:10:24.4550870Z OK 2023-01-11T22:10:24.4550889Z 2023-01-11T22:10:24.4551006Z Generating XML reports... 2023-01-11T22:10:24.4551442Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215010.xml 2023-01-11T22:10:24.4551801Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4552018Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4552391Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4552573Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4552593Z 2023-01-11T22:10:24.4552694Z Running tests... 2023-01-11T22:10:24.4552944Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4553242Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4553497Z test_all_reduce_multigpu (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4553703Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34659 2023-01-11T22:10:24.4553901Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34660 2023-01-11T22:10:24.4554263Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4554427Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4554840Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4555029Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4555388Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4555549Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4555912Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4556095Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4556318Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4556553Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4556947Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4557328Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4557545Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4557761Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4558088Z STAGE:2023-01-11 21:50:25 34660:34660 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4558837Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1582: UserWarning: torch.distributed.all_reduce_multigpu will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#multi-gpu-collective-functions 2023-01-11T22:10:24.4558946Z warnings.warn( 2023-01-11T22:10:24.4559256Z STAGE:2023-01-11 21:50:25 34659:34659 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4560004Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1582: UserWarning: torch.distributed.all_reduce_multigpu will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#multi-gpu-collective-functions 2023-01-11T22:10:24.4560111Z warnings.warn( 2023-01-11T22:10:24.4560371Z [1673473825.546452] [7c5487d9c02b:34660:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4560599Z [1673473825.560913] [7c5487d9c02b:34660:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4560887Z [1673473825.560913] [7c5487d9c02b:34660:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4561147Z [1673473825.540052] [7c5487d9c02b:34659:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4561363Z [1673473825.555197] [7c5487d9c02b:34659:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4561590Z [1673473825.555197] [7c5487d9c02b:34659:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4562129Z STAGE:2023-01-11 21:50:25 34660:34660 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:50:25 34659:34659 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4562150Z 2023-01-11T22:10:24.4562487Z STAGE:2023-01-11 21:50:25 34660:34660 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4562889Z STAGE:2023-01-11 21:50:25 34659:34659 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4563251Z STAGE:2023-01-11 21:50:26 34660:34660 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4563574Z STAGE:2023-01-11 21:50:26 34659:34659 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4563895Z STAGE:2023-01-11 21:50:26 34660:34660 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4564228Z STAGE:2023-01-11 21:50:26 34660:34660 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4564547Z STAGE:2023-01-11 21:50:26 34659:34659 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4564875Z STAGE:2023-01-11 21:50:26 34659:34659 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4564976Z ok (6.765s) 2023-01-11T22:10:24.4564995Z 2023-01-11T22:10:24.4565246Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4565339Z Ran 1 test in 6.765s 2023-01-11T22:10:24.4565372Z 2023-01-11T22:10:24.4565449Z OK 2023-01-11T22:10:24.4565468Z 2023-01-11T22:10:24.4565583Z Generating XML reports... 2023-01-11T22:10:24.4566022Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215019.xml 2023-01-11T22:10:24.4566382Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4566550Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4566921Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4567103Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4567126Z 2023-01-11T22:10:24.4567228Z Running tests... 2023-01-11T22:10:24.4567469Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4567770Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4568033Z test_all_reduce_multigpu_complex (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4568243Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34777 2023-01-11T22:10:24.4568451Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34778 2023-01-11T22:10:24.4568809Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4568974Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4569342Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4569570Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4569926Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4570093Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4570461Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4570639Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4570874Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4571111Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4571495Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4571882Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4572096Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4572361Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4572699Z STAGE:2023-01-11 21:50:34 34777:34777 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4573443Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1582: UserWarning: torch.distributed.all_reduce_multigpu will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#multi-gpu-collective-functions 2023-01-11T22:10:24.4573549Z warnings.warn( 2023-01-11T22:10:24.4573870Z STAGE:2023-01-11 21:50:34 34778:34778 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4574616Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1582: UserWarning: torch.distributed.all_reduce_multigpu will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#multi-gpu-collective-functions 2023-01-11T22:10:24.4574722Z warnings.warn( 2023-01-11T22:10:24.4574981Z [1673473834.808551] [7c5487d9c02b:34778:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4575203Z [1673473834.822915] [7c5487d9c02b:34778:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4575421Z [1673473834.822915] [7c5487d9c02b:34778:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4575749Z STAGE:2023-01-11 21:50:35 34778:34778 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4576007Z [1673473834.807153] [7c5487d9c02b:34777:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4576231Z [1673473834.821857] [7c5487d9c02b:34777:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4576459Z [1673473834.821857] [7c5487d9c02b:34777:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4577028Z STAGE:2023-01-11 21:50:35 34777:34777 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4577372Z STAGE:2023-01-11 21:50:35 34777:34777 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4577708Z STAGE:2023-01-11 21:50:35 34778:34778 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4578027Z STAGE:2023-01-11 21:50:35 34777:34777 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4578439Z STAGE:2023-01-11 21:50:35 34777:34777 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4578762Z STAGE:2023-01-11 21:50:35 34777:34777 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4579083Z STAGE:2023-01-11 21:50:35 34778:34778 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4579404Z STAGE:2023-01-11 21:50:35 34778:34778 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4579732Z STAGE:2023-01-11 21:50:35 34778:34778 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4579826Z ok (6.613s) 2023-01-11T22:10:24.4579846Z 2023-01-11T22:10:24.4580100Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4580203Z Ran 1 test in 6.613s 2023-01-11T22:10:24.4580222Z 2023-01-11T22:10:24.4580308Z OK 2023-01-11T22:10:24.4580327Z 2023-01-11T22:10:24.4580441Z Generating XML reports... 2023-01-11T22:10:24.4580873Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215029.xml 2023-01-11T22:10:24.4581295Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4581475Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4581854Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4582037Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4582056Z 2023-01-11T22:10:24.4582158Z Running tests... 2023-01-11T22:10:24.4582407Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4582704Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4582943Z test_all_reduce_product (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4583158Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34895 2023-01-11T22:10:24.4583362Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34896 2023-01-11T22:10:24.4583723Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4583886Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4584254Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4584432Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4584792Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4584958Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4585308Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4585491Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4585732Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4585962Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4586353Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4586735Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4586954Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4587276Z STAGE:2023-01-11 21:50:42 34895:34895 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4587540Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4587862Z STAGE:2023-01-11 21:50:42 34896:34896 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4588134Z [1673473842.297447] [7c5487d9c02b:34895:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4588353Z [1673473843.956977] [7c5487d9c02b:34895:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4588584Z [1673473843.956977] [7c5487d9c02b:34895:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4588841Z [1673473842.298209] [7c5487d9c02b:34896:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4589062Z [1673473843.967292] [7c5487d9c02b:34896:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4589288Z [1673473843.967292] [7c5487d9c02b:34896:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4589873Z STAGE:2023-01-11 21:50:44 34895:34895 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:50:44 34896:34896 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4589895Z 2023-01-11T22:10:24.4590242Z STAGE:2023-01-11 21:50:44 34895:34895 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4590579Z STAGE:2023-01-11 21:50:44 34896:34896 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4590884Z STAGE:2023-01-11 21:50:44 34896:34896 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4591193Z STAGE:2023-01-11 21:50:44 34895:34895 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4591516Z STAGE:2023-01-11 21:50:44 34896:34896 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4592057Z STAGE:2023-01-11 21:50:44 34896:34896 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 21:50:44 34895:34895 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4592077Z 2023-01-11T22:10:24.4592407Z STAGE:2023-01-11 21:50:44 34895:34895 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4592503Z ok (6.608s) 2023-01-11T22:10:24.4592522Z 2023-01-11T22:10:24.4592776Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4592879Z Ran 1 test in 6.608s 2023-01-11T22:10:24.4592899Z 2023-01-11T22:10:24.4592982Z OK 2023-01-11T22:10:24.4593001Z 2023-01-11T22:10:24.4593116Z Generating XML reports... 2023-01-11T22:10:24.4593541Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215038.xml 2023-01-11T22:10:24.4593905Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4594072Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4594443Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4594627Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4594647Z 2023-01-11T22:10:24.4594743Z Running tests... 2023-01-11T22:10:24.4594997Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4595298Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4595543Z test_all_reduce_result_cuda (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4595754Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35009 2023-01-11T22:10:24.4596022Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35010 2023-01-11T22:10:24.4596387Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4596550Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4596915Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4597102Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4597453Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4597614Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4597963Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4598145Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4598379Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4598657Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4599051Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4599435Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4599654Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4599870Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4600132Z [1673473852.854615] [7c5487d9c02b:35010:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4600346Z [1673473852.867856] [7c5487d9c02b:35010:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4600576Z [1673473852.867856] [7c5487d9c02b:35010:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4600837Z [1673473852.851610] [7c5487d9c02b:35009:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4601055Z [1673473852.865272] [7c5487d9c02b:35009:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4601286Z [1673473852.865272] [7c5487d9c02b:35009:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4601380Z ok (6.146s) 2023-01-11T22:10:24.4601400Z 2023-01-11T22:10:24.4601659Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4601770Z Ran 1 test in 6.146s 2023-01-11T22:10:24.4601790Z 2023-01-11T22:10:24.4601873Z OK 2023-01-11T22:10:24.4601892Z 2023-01-11T22:10:24.4601997Z Generating XML reports... 2023-01-11T22:10:24.4602435Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215047.xml 2023-01-11T22:10:24.4602793Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4602960Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4603325Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4603509Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4603528Z 2023-01-11T22:10:24.4603626Z Running tests... 2023-01-11T22:10:24.4603874Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4604271Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4604507Z test_all_reduce_sum (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4604719Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35123 2023-01-11T22:10:24.4604924Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35124 2023-01-11T22:10:24.4605282Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4605446Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4605811Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4605991Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4606342Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4606501Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4606921Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4607104Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4607389Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4607628Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4608019Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4608411Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4608625Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4608947Z STAGE:2023-01-11 21:50:59 35124:35124 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4609173Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4609500Z STAGE:2023-01-11 21:51:00 35123:35123 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4609764Z [1673473860.112389] [7c5487d9c02b:35124:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4609986Z [1673473861.747114] [7c5487d9c02b:35124:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4610217Z [1673473861.747114] [7c5487d9c02b:35124:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4610486Z [1673473860.091263] [7c5487d9c02b:35123:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4610705Z [1673473861.802206] [7c5487d9c02b:35123:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4610929Z [1673473861.802206] [7c5487d9c02b:35123:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4611460Z STAGE:2023-01-11 21:51:02 35124:35124 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:51:02 35123:35123 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4611494Z 2023-01-11T22:10:24.4611824Z STAGE:2023-01-11 21:51:02 35124:35124 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4612159Z STAGE:2023-01-11 21:51:02 35123:35123 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4612540Z STAGE:2023-01-11 21:51:02 35124:35124 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4612849Z STAGE:2023-01-11 21:51:02 35123:35123 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4613172Z STAGE:2023-01-11 21:51:02 35124:35124 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4613491Z STAGE:2023-01-11 21:51:02 35123:35123 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4613819Z STAGE:2023-01-11 21:51:02 35124:35124 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4614148Z STAGE:2023-01-11 21:51:02 35123:35123 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4614232Z ok (6.719s) 2023-01-11T22:10:24.4614259Z 2023-01-11T22:10:24.4614503Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4614607Z Ran 1 test in 6.720s 2023-01-11T22:10:24.4614629Z 2023-01-11T22:10:24.4614712Z OK 2023-01-11T22:10:24.4614731Z 2023-01-11T22:10:24.4614852Z Generating XML reports... 2023-01-11T22:10:24.4615331Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215056.xml 2023-01-11T22:10:24.4615699Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4615864Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4616229Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4616402Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4616421Z 2023-01-11T22:10:24.4616519Z Running tests... 2023-01-11T22:10:24.4617007Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4617316Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4617571Z test_all_reduce_sum_async (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4617784Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35237 2023-01-11T22:10:24.4617997Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35238 2023-01-11T22:10:24.4618356Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4618513Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4618884Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4619064Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4619414Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4619581Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4619947Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4620122Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4620361Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4620593Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4620969Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4621356Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4621576Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4621995Z STAGE:2023-01-11 21:51:09 35237:35237 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4622220Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4622536Z STAGE:2023-01-11 21:51:09 35238:35238 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4622807Z [1673473869.387811] [7c5487d9c02b:35237:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4623025Z [1673473871.013026] [7c5487d9c02b:35237:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4623253Z [1673473871.013026] [7c5487d9c02b:35237:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4623511Z [1673473869.407856] [7c5487d9c02b:35238:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4623724Z [1673473871.047228] [7c5487d9c02b:35238:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4624017Z [1673473871.047228] [7c5487d9c02b:35238:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4624571Z STAGE:2023-01-11 21:51:11 35237:35237 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:51:11 35238:35238 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4624591Z 2023-01-11T22:10:24.4624932Z STAGE:2023-01-11 21:51:11 35238:35238 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4625264Z STAGE:2023-01-11 21:51:11 35237:35237 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4625577Z STAGE:2023-01-11 21:51:11 35238:35238 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4625896Z STAGE:2023-01-11 21:51:11 35237:35237 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4626226Z STAGE:2023-01-11 21:51:11 35238:35238 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4626545Z STAGE:2023-01-11 21:51:11 35237:35237 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4626877Z STAGE:2023-01-11 21:51:11 35238:35238 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4627202Z STAGE:2023-01-11 21:51:11 35237:35237 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4627294Z ok (6.659s) 2023-01-11T22:10:24.4627313Z 2023-01-11T22:10:24.4627566Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4627672Z Ran 1 test in 6.659s 2023-01-11T22:10:24.4627691Z 2023-01-11T22:10:24.4627771Z OK 2023-01-11T22:10:24.4627791Z 2023-01-11T22:10:24.4627907Z Generating XML reports... 2023-01-11T22:10:24.4628346Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215105.xml 2023-01-11T22:10:24.4628713Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4628870Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4629233Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4629416Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4629436Z 2023-01-11T22:10:24.4629534Z Running tests... 2023-01-11T22:10:24.4629788Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4630087Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4630349Z test_all_reduce_sum_complex (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4630618Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35351 2023-01-11T22:10:24.4630831Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35352 2023-01-11T22:10:24.4631183Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4631347Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4631716Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4631895Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4632247Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4632408Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4632771Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4632990Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4633221Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4633458Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4633850Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4634237Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4634455Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4634784Z STAGE:2023-01-11 21:51:18 35352:35352 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4635004Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4635334Z STAGE:2023-01-11 21:51:18 35351:35351 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4635605Z [1673473878.597067] [7c5487d9c02b:35352:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4635817Z [1673473880.210526] [7c5487d9c02b:35352:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4636050Z [1673473880.210526] [7c5487d9c02b:35352:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4636316Z [1673473878.576116] [7c5487d9c02b:35351:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4636535Z [1673473880.240952] [7c5487d9c02b:35351:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4636770Z [1673473880.240952] [7c5487d9c02b:35351:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4637310Z STAGE:2023-01-11 21:51:20 35352:35352 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:51:20 35351:35351 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4637331Z 2023-01-11T22:10:24.4637671Z STAGE:2023-01-11 21:51:20 35351:35351 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4638011Z STAGE:2023-01-11 21:51:20 35352:35352 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4638332Z STAGE:2023-01-11 21:51:20 35352:35352 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4638647Z STAGE:2023-01-11 21:51:20 35351:35351 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4639030Z STAGE:2023-01-11 21:51:20 35352:35352 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4639340Z STAGE:2023-01-11 21:51:20 35351:35351 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4639678Z STAGE:2023-01-11 21:51:20 35352:35352 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4640010Z STAGE:2023-01-11 21:51:20 35351:35351 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4640109Z ok (6.661s) 2023-01-11T22:10:24.4640128Z 2023-01-11T22:10:24.4640384Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4640496Z Ran 1 test in 6.661s 2023-01-11T22:10:24.4640516Z 2023-01-11T22:10:24.4640598Z OK 2023-01-11T22:10:24.4640617Z 2023-01-11T22:10:24.4640738Z Generating XML reports... 2023-01-11T22:10:24.4641164Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215114.xml 2023-01-11T22:10:24.4641533Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4641753Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4642134Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4642321Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4642340Z 2023-01-11T22:10:24.4642446Z Running tests... 2023-01-11T22:10:24.4642703Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4643013Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4643310Z test_all_reduce_sum_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Gloo and NCCL backends will have CUDA allReduce tested (0.002s) 2023-01-11T22:10:24.4643333Z 2023-01-11T22:10:24.4643576Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4643685Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4643704Z 2023-01-11T22:10:24.4643815Z OK (skipped=1) 2023-01-11T22:10:24.4643835Z 2023-01-11T22:10:24.4643957Z Generating XML reports... 2023-01-11T22:10:24.4644397Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215123.xml 2023-01-11T22:10:24.4644764Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4644937Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4645308Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4645497Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4645517Z 2023-01-11T22:10:24.4645610Z Running tests... 2023-01-11T22:10:24.4645872Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4646181Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4646482Z test_all_reduce_sum_cuda_async (__main__.TestDistBackendWithSpawn) ... skip: Only Gloo and NCCL backends will have CUDA allReduce tested (0.002s) 2023-01-11T22:10:24.4646503Z 2023-01-11T22:10:24.4646759Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4646868Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4646887Z 2023-01-11T22:10:24.4646992Z OK (skipped=1) 2023-01-11T22:10:24.4647011Z 2023-01-11T22:10:24.4647134Z Generating XML reports... 2023-01-11T22:10:24.4647573Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215126.xml 2023-01-11T22:10:24.4647920Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4648151Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4648533Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4648722Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4648741Z 2023-01-11T22:10:24.4648849Z Running tests... 2023-01-11T22:10:24.4649102Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4649406Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4649713Z test_all_reduce_sum_cuda_complex (__main__.TestDistBackendWithSpawn) ... skip: Only Gloo and NCCL backends will have CUDA allReduce tested (0.002s) 2023-01-11T22:10:24.4649733Z 2023-01-11T22:10:24.4649988Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4650084Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4650104Z 2023-01-11T22:10:24.4650214Z OK (skipped=1) 2023-01-11T22:10:24.4650233Z 2023-01-11T22:10:24.4650356Z Generating XML reports... 2023-01-11T22:10:24.4650842Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215128.xml 2023-01-11T22:10:24.4651212Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4651387Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4651757Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4651945Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4651964Z 2023-01-11T22:10:24.4652074Z Running tests... 2023-01-11T22:10:24.4652316Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4652628Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4652868Z test_all_to_all (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports all_to_all (0.002s) 2023-01-11T22:10:24.4652891Z 2023-01-11T22:10:24.4653151Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4653263Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4653283Z 2023-01-11T22:10:24.4653387Z OK (skipped=1) 2023-01-11T22:10:24.4653406Z 2023-01-11T22:10:24.4653529Z Generating XML reports... 2023-01-11T22:10:24.4653963Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215131.xml 2023-01-11T22:10:24.4654308Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4654482Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4654852Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4655043Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4655063Z 2023-01-11T22:10:24.4655173Z Running tests... 2023-01-11T22:10:24.4655428Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4655733Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4655983Z test_all_to_all_complex (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports all_to_all (0.002s) 2023-01-11T22:10:24.4656002Z 2023-01-11T22:10:24.4656263Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4656356Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4656375Z 2023-01-11T22:10:24.4656483Z OK (skipped=1) 2023-01-11T22:10:24.4656502Z 2023-01-11T22:10:24.4656956Z Generating XML reports... 2023-01-11T22:10:24.4657414Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215133.xml 2023-01-11T22:10:24.4657875Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4658116Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4658544Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4658771Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4658792Z 2023-01-11T22:10:24.4658935Z Running tests... 2023-01-11T22:10:24.4659180Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4659523Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4659790Z test_all_to_all_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only NCCL supports CUDA all_to_all (0.002s) 2023-01-11T22:10:24.4659813Z 2023-01-11T22:10:24.4660212Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4660361Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4660380Z 2023-01-11T22:10:24.4660600Z OK (skipped=1) 2023-01-11T22:10:24.4660621Z 2023-01-11T22:10:24.4660822Z Generating XML reports... 2023-01-11T22:10:24.4661306Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215135.xml 2023-01-11T22:10:24.4661706Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4661915Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4662274Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4662500Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4662520Z 2023-01-11T22:10:24.4662715Z Running tests... 2023-01-11T22:10:24.4663023Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4663372Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4663672Z test_all_to_all_cuda_complex (__main__.TestDistBackendWithSpawn) ... skip: Only NCCL supports CUDA all_to_all (0.002s) 2023-01-11T22:10:24.4663692Z 2023-01-11T22:10:24.4663989Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4664136Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4664156Z 2023-01-11T22:10:24.4664245Z OK (skipped=1) 2023-01-11T22:10:24.4664319Z 2023-01-11T22:10:24.4664425Z Generating XML reports... 2023-01-11T22:10:24.4664898Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215138.xml 2023-01-11T22:10:24.4665346Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4665562Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4666001Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4666225Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4666245Z 2023-01-11T22:10:24.4666387Z Running tests... 2023-01-11T22:10:24.4666684Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4666974Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4667264Z test_all_to_all_full_group (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports all_to_all (0.002s) 2023-01-11T22:10:24.4667284Z 2023-01-11T22:10:24.4667582Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4667759Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4667779Z 2023-01-11T22:10:24.4667981Z OK (skipped=1) 2023-01-11T22:10:24.4668000Z 2023-01-11T22:10:24.4668161Z Generating XML reports... 2023-01-11T22:10:24.4668645Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215140.xml 2023-01-11T22:10:24.4669046Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4669266Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4669623Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4669848Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4669868Z 2023-01-11T22:10:24.4670009Z Running tests... 2023-01-11T22:10:24.4670338Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4670715Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4671024Z test_all_to_all_full_group_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only NCCL supports CUDA all_to_all (0.002s) 2023-01-11T22:10:24.4671044Z 2023-01-11T22:10:24.4671385Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4671548Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4671568Z 2023-01-11T22:10:24.4671709Z OK (skipped=1) 2023-01-11T22:10:24.4671729Z 2023-01-11T22:10:24.4671834Z Generating XML reports... 2023-01-11T22:10:24.4672309Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215143.xml 2023-01-11T22:10:24.4672710Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4672954Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4673361Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4673592Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4673612Z 2023-01-11T22:10:24.4673764Z Running tests... 2023-01-11T22:10:24.4674068Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4674360Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4674644Z test_all_to_all_group (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports all_to_all (0.002s) 2023-01-11T22:10:24.4674665Z 2023-01-11T22:10:24.4674956Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4675101Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4675120Z 2023-01-11T22:10:24.4675348Z OK (skipped=1) 2023-01-11T22:10:24.4675368Z 2023-01-11T22:10:24.4675526Z Generating XML reports... 2023-01-11T22:10:24.4676008Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215145.xml 2023-01-11T22:10:24.4676412Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4676623Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4676982Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4677205Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4677225Z 2023-01-11T22:10:24.4677364Z Running tests... 2023-01-11T22:10:24.4677663Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4678039Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4678361Z test_all_to_all_group_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T22:10:24.4678381Z 2023-01-11T22:10:24.4678736Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4678881Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4678901Z 2023-01-11T22:10:24.4679043Z OK (skipped=1) 2023-01-11T22:10:24.4679064Z 2023-01-11T22:10:24.4679175Z Generating XML reports... 2023-01-11T22:10:24.4679650Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215147.xml 2023-01-11T22:10:24.4680051Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4680302Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4680745Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4680970Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4680990Z 2023-01-11T22:10:24.4681133Z Running tests... 2023-01-11T22:10:24.4681433Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4681725Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4682089Z test_all_to_all_single_equal_split (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T22:10:24.4682110Z 2023-01-11T22:10:24.4682411Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4682571Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4682591Z 2023-01-11T22:10:24.4682734Z OK (skipped=1) 2023-01-11T22:10:24.4682753Z 2023-01-11T22:10:24.4682945Z Generating XML reports... 2023-01-11T22:10:24.4683423Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215150.xml 2023-01-11T22:10:24.4683816Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4684030Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4684384Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4684611Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4684631Z 2023-01-11T22:10:24.4684786Z Running tests... 2023-01-11T22:10:24.4685114Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4685455Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4685811Z test_all_to_all_single_equal_split_complex (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T22:10:24.4685832Z 2023-01-11T22:10:24.4686126Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4686272Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4686292Z 2023-01-11T22:10:24.4686439Z OK (skipped=1) 2023-01-11T22:10:24.4686458Z 2023-01-11T22:10:24.4686563Z Generating XML reports... 2023-01-11T22:10:24.4687047Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215152.xml 2023-01-11T22:10:24.4687452Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4687662Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4688067Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4688327Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4688346Z 2023-01-11T22:10:24.4688491Z Running tests... 2023-01-11T22:10:24.4688786Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4689136Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4689464Z test_all_to_all_single_equal_split_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T22:10:24.4689484Z 2023-01-11T22:10:24.4689789Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4689965Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4689986Z 2023-01-11T22:10:24.4690128Z OK (skipped=1) 2023-01-11T22:10:24.4690146Z 2023-01-11T22:10:24.4690300Z Generating XML reports... 2023-01-11T22:10:24.4690807Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215154.xml 2023-01-11T22:10:24.4691219Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4691426Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4691835Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4692012Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4692031Z 2023-01-11T22:10:24.4692174Z Running tests... 2023-01-11T22:10:24.4692515Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4692862Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4693197Z test_all_to_all_single_equal_split_cuda_complex (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T22:10:24.4693217Z 2023-01-11T22:10:24.4693558Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4693705Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4693725Z 2023-01-11T22:10:24.4693868Z OK (skipped=1) 2023-01-11T22:10:24.4693887Z 2023-01-11T22:10:24.4693993Z Generating XML reports... 2023-01-11T22:10:24.4694469Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215157.xml 2023-01-11T22:10:24.4694902Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4695113Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4695518Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4695752Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4695772Z 2023-01-11T22:10:24.4695946Z Running tests... 2023-01-11T22:10:24.4696242Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4696833Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4697126Z test_all_to_all_single_equal_split_full_group (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T22:10:24.4697213Z 2023-01-11T22:10:24.4697463Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4697611Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4697630Z 2023-01-11T22:10:24.4697775Z OK (skipped=1) 2023-01-11T22:10:24.4697796Z 2023-01-11T22:10:24.4697968Z Generating XML reports... 2023-01-11T22:10:24.4698446Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215159.xml 2023-01-11T22:10:24.4698893Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4699103Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4699514Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4699685Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4699867Z 2023-01-11T22:10:24.4699969Z Running tests... 2023-01-11T22:10:24.4700280Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4700628Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4700971Z test_all_to_all_single_equal_split_full_group_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T22:10:24.4700991Z 2023-01-11T22:10:24.4701285Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4701470Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4701490Z 2023-01-11T22:10:24.4701631Z OK (skipped=1) 2023-01-11T22:10:24.4701650Z 2023-01-11T22:10:24.4701808Z Generating XML reports... 2023-01-11T22:10:24.4702231Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215202.xml 2023-01-11T22:10:24.4702645Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4702859Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4703333Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4703569Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4703589Z 2023-01-11T22:10:24.4703731Z Running tests... 2023-01-11T22:10:24.4704066Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4704410Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4704683Z test_all_to_all_single_equal_split_group (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T22:10:24.4704798Z 2023-01-11T22:10:24.4705041Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4705193Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4705213Z 2023-01-11T22:10:24.4705355Z OK (skipped=1) 2023-01-11T22:10:24.4705374Z 2023-01-11T22:10:24.4705530Z Generating XML reports... 2023-01-11T22:10:24.4706007Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215204.xml 2023-01-11T22:10:24.4706409Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4706652Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4732535Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4732799Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4732821Z 2023-01-11T22:10:24.4732918Z Running tests... 2023-01-11T22:10:24.4733224Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4733554Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4733865Z test_all_to_all_single_equal_split_group_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T22:10:24.4733886Z 2023-01-11T22:10:24.4734157Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4734271Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4734291Z 2023-01-11T22:10:24.4734402Z OK (skipped=1) 2023-01-11T22:10:24.4734421Z 2023-01-11T22:10:24.4734547Z Generating XML reports... 2023-01-11T22:10:24.4735004Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215206.xml 2023-01-11T22:10:24.4735364Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4735541Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4736044Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4736238Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4736263Z 2023-01-11T22:10:24.4736372Z Running tests... 2023-01-11T22:10:24.4736906Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4737241Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4737528Z test_all_to_all_single_unequal_split (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T22:10:24.4737549Z 2023-01-11T22:10:24.4737812Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4737907Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4737926Z 2023-01-11T22:10:24.4738035Z OK (skipped=1) 2023-01-11T22:10:24.4738054Z 2023-01-11T22:10:24.4738178Z Generating XML reports... 2023-01-11T22:10:24.4738625Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215209.xml 2023-01-11T22:10:24.4739107Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4739298Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4739681Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4739872Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4739892Z 2023-01-11T22:10:24.4740003Z Running tests... 2023-01-11T22:10:24.4740244Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4740551Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4740847Z test_all_to_all_single_unequal_split_complex (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T22:10:24.4740872Z 2023-01-11T22:10:24.4741128Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4741243Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4741263Z 2023-01-11T22:10:24.4741371Z OK (skipped=1) 2023-01-11T22:10:24.4741390Z 2023-01-11T22:10:24.4741514Z Generating XML reports... 2023-01-11T22:10:24.4741953Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215211.xml 2023-01-11T22:10:24.4742300Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4742478Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4742853Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4743044Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4743067Z 2023-01-11T22:10:24.4743177Z Running tests... 2023-01-11T22:10:24.4743437Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4743749Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4744045Z test_all_to_all_single_unequal_split_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T22:10:24.4744066Z 2023-01-11T22:10:24.4744324Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4744417Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4744455Z 2023-01-11T22:10:24.4744546Z OK (skipped=1) 2023-01-11T22:10:24.4744564Z 2023-01-11T22:10:24.4744687Z Generating XML reports... 2023-01-11T22:10:24.4745122Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215214.xml 2023-01-11T22:10:24.4745570Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4745747Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4746123Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4746315Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4746335Z 2023-01-11T22:10:24.4746444Z Running tests... 2023-01-11T22:10:24.4746682Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4746990Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4747293Z test_all_to_all_single_unequal_split_cuda_complex (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T22:10:24.4747313Z 2023-01-11T22:10:24.4747571Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4747686Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4747705Z 2023-01-11T22:10:24.4747814Z OK (skipped=1) 2023-01-11T22:10:24.4747833Z 2023-01-11T22:10:24.4748004Z Generating XML reports... 2023-01-11T22:10:24.4748455Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215216.xml 2023-01-11T22:10:24.4748823Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4748980Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4749359Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4749550Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4749569Z 2023-01-11T22:10:24.4749679Z Running tests... 2023-01-11T22:10:24.4749943Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4750251Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4750554Z test_all_to_all_single_unequal_split_full_group (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T22:10:24.4750574Z 2023-01-11T22:10:24.4750828Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4750929Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4750948Z 2023-01-11T22:10:24.4751038Z OK (skipped=1) 2023-01-11T22:10:24.4751057Z 2023-01-11T22:10:24.4751167Z Generating XML reports... 2023-01-11T22:10:24.4751597Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215218.xml 2023-01-11T22:10:24.4751950Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4752115Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4752480Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4752660Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4752680Z 2023-01-11T22:10:24.4752776Z Running tests... 2023-01-11T22:10:24.4753018Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4753311Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4753603Z test_all_to_all_single_unequal_split_full_group_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T22:10:24.4753623Z 2023-01-11T22:10:24.4753868Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4753968Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4753987Z 2023-01-11T22:10:24.4754142Z OK (skipped=1) 2023-01-11T22:10:24.4754162Z 2023-01-11T22:10:24.4754277Z Generating XML reports... 2023-01-11T22:10:24.4754713Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215221.xml 2023-01-11T22:10:24.4755068Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4755226Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4755590Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4755766Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4755786Z 2023-01-11T22:10:24.4755882Z Running tests... 2023-01-11T22:10:24.4756127Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4756421Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4756705Z test_all_to_all_single_unequal_split_group (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T22:10:24.4756726Z 2023-01-11T22:10:24.4757047Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4757152Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4757172Z 2023-01-11T22:10:24.4757261Z OK (skipped=1) 2023-01-11T22:10:24.4757280Z 2023-01-11T22:10:24.4757392Z Generating XML reports... 2023-01-11T22:10:24.4757818Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215223.xml 2023-01-11T22:10:24.4758168Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4758329Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4758689Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4758870Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4758890Z 2023-01-11T22:10:24.4758985Z Running tests... 2023-01-11T22:10:24.4759231Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4759519Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4759807Z test_all_to_all_single_unequal_split_group_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T22:10:24.4759827Z 2023-01-11T22:10:24.4760070Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4760168Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4760187Z 2023-01-11T22:10:24.4760282Z OK (skipped=1) 2023-01-11T22:10:24.4760301Z 2023-01-11T22:10:24.4760412Z Generating XML reports... 2023-01-11T22:10:24.4760837Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215226.xml 2023-01-11T22:10:24.4761199Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4761361Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4761719Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4761898Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4761917Z 2023-01-11T22:10:24.4762012Z Running tests... 2023-01-11T22:10:24.4762258Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4762553Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4762801Z test_average_parameters (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4763079Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36356 2023-01-11T22:10:24.4763287Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36357 2023-01-11T22:10:24.4763639Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4763803Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4764164Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4764341Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4764693Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4764857Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4765217Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4765398Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4765675Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4765914Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4766304Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4766685Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4766901Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4767117Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4767377Z [1673473954.950396] [7c5487d9c02b:36357:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4767607Z [1673473954.963698] [7c5487d9c02b:36357:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4767834Z [1673473954.963698] [7c5487d9c02b:36357:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4768090Z [1673473954.946506] [7c5487d9c02b:36356:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4768300Z [1673473954.960446] [7c5487d9c02b:36356:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4768521Z [1673473954.960446] [7c5487d9c02b:36356:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4768753Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:10:24.4768985Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:10:24.4769373Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.4769754Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.4769845Z ok (7.465s) 2023-01-11T22:10:24.4769865Z 2023-01-11T22:10:24.4770117Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4770217Z Ran 1 test in 7.465s 2023-01-11T22:10:24.4770237Z 2023-01-11T22:10:24.4770311Z OK 2023-01-11T22:10:24.4770337Z 2023-01-11T22:10:24.4770443Z Generating XML reports... 2023-01-11T22:10:24.4770871Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215228.xml 2023-01-11T22:10:24.4771283Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4771447Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4771815Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4771994Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4772014Z 2023-01-11T22:10:24.4772110Z Running tests... 2023-01-11T22:10:24.4772362Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4772651Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4772899Z test_backend_full_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4773106Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36480 2023-01-11T22:10:24.4773314Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36481 2023-01-11T22:10:24.4773712Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4773879Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4774244Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4774419Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4774765Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4774928Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4775294Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4775470Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4775708Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4775941Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4776329Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4776945Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4777173Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4777382Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4777522Z skip: Need at least 3 CUDA devices (4.254s) 2023-01-11T22:10:24.4777542Z 2023-01-11T22:10:24.4777804Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4777911Z Ran 1 test in 4.255s 2023-01-11T22:10:24.4777931Z 2023-01-11T22:10:24.4778026Z OK (skipped=1) 2023-01-11T22:10:24.4778045Z 2023-01-11T22:10:24.4778161Z Generating XML reports... 2023-01-11T22:10:24.4778595Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215238.xml 2023-01-11T22:10:24.4778951Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4779114Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4779470Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4779647Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4779666Z 2023-01-11T22:10:24.4779763Z Running tests... 2023-01-11T22:10:24.4780010Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4780405Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4780647Z test_backend_group (__main__.TestDistBackendWithSpawn) ... skip: Test requires world size of 3 (0.002s) 2023-01-11T22:10:24.4780667Z 2023-01-11T22:10:24.4780913Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4781013Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4781032Z 2023-01-11T22:10:24.4781121Z OK (skipped=1) 2023-01-11T22:10:24.4781147Z 2023-01-11T22:10:24.4781254Z Generating XML reports... 2023-01-11T22:10:24.4781683Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215245.xml 2023-01-11T22:10:24.4782036Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4782198Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4782564Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4782805Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4782827Z 2023-01-11T22:10:24.4782930Z Running tests... 2023-01-11T22:10:24.4783180Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4783472Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4783707Z test_barrier (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support CPU barrier (0.002s) 2023-01-11T22:10:24.4783726Z 2023-01-11T22:10:24.4783974Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4784073Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4784092Z 2023-01-11T22:10:24.4784187Z OK (skipped=1) 2023-01-11T22:10:24.4784206Z 2023-01-11T22:10:24.4784317Z Generating XML reports... 2023-01-11T22:10:24.4784751Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215247.xml 2023-01-11T22:10:24.4785108Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4785272Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4785628Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4785806Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4785826Z 2023-01-11T22:10:24.4785922Z Running tests... 2023-01-11T22:10:24.4786170Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4786464Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4786700Z test_barrier_cuda (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4786911Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36649 2023-01-11T22:10:24.4787118Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36650 2023-01-11T22:10:24.4787466Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4787631Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4787991Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4788167Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4788517Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4788678Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4789035Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4789269Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4789502Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4789728Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4790113Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4790493Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4790707Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4790923Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4791185Z [1673473975.451308] [7c5487d9c02b:36649:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4791450Z [1673473975.464935] [7c5487d9c02b:36649:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4791682Z [1673473975.464935] [7c5487d9c02b:36649:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4791941Z [1673473975.455071] [7c5487d9c02b:36650:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4792156Z [1673473975.468458] [7c5487d9c02b:36650:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4792372Z [1673473975.468458] [7c5487d9c02b:36650:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4792469Z ok (6.934s) 2023-01-11T22:10:24.4792489Z 2023-01-11T22:10:24.4792747Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4792848Z Ran 1 test in 6.935s 2023-01-11T22:10:24.4792868Z 2023-01-11T22:10:24.4792951Z OK 2023-01-11T22:10:24.4792970Z 2023-01-11T22:10:24.4793082Z Generating XML reports... 2023-01-11T22:10:24.4793512Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215250.xml 2023-01-11T22:10:24.4793868Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4794026Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4794387Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4794565Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4794588Z 2023-01-11T22:10:24.4794684Z Running tests... 2023-01-11T22:10:24.4794937Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4795237Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4795486Z test_barrier_full_group (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support CPU barrier (0.002s) 2023-01-11T22:10:24.4795506Z 2023-01-11T22:10:24.4795754Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4795855Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4795874Z 2023-01-11T22:10:24.4795963Z OK (skipped=1) 2023-01-11T22:10:24.4795982Z 2023-01-11T22:10:24.4796094Z Generating XML reports... 2023-01-11T22:10:24.4796519Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215259.xml 2023-01-11T22:10:24.4796871Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4797090Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4797460Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4797640Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4797659Z 2023-01-11T22:10:24.4797756Z Running tests... 2023-01-11T22:10:24.4798003Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4798294Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4798546Z test_barrier_full_group_cuda (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4798752Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36796 2023-01-11T22:10:24.4798958Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36797 2023-01-11T22:10:24.4799314Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4799475Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4799899Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4800082Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4800429Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4800591Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4800951Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4801127Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4801357Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4801593Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4801979Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4802358Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4802575Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4802780Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4802927Z skip: Skipped due to small world size. (4.204s) 2023-01-11T22:10:24.4802946Z 2023-01-11T22:10:24.4803194Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4803292Z Ran 1 test in 4.204s 2023-01-11T22:10:24.4803315Z 2023-01-11T22:10:24.4803410Z OK (skipped=1) 2023-01-11T22:10:24.4803430Z 2023-01-11T22:10:24.4803541Z Generating XML reports... 2023-01-11T22:10:24.4803970Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215301.xml 2023-01-11T22:10:24.4804324Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4804487Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4804844Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4805020Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4805039Z 2023-01-11T22:10:24.4805136Z Running tests... 2023-01-11T22:10:24.4805384Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4805679Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4805982Z test_barrier_group (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support CPU barrier (0.002s) 2023-01-11T22:10:24.4806002Z 2023-01-11T22:10:24.4806253Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4806352Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4806371Z 2023-01-11T22:10:24.4806461Z OK (skipped=1) 2023-01-11T22:10:24.4806487Z 2023-01-11T22:10:24.4806592Z Generating XML reports... 2023-01-11T22:10:24.4807017Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215308.xml 2023-01-11T22:10:24.4807369Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4807584Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4807950Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4808132Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4808152Z 2023-01-11T22:10:24.4808246Z Running tests... 2023-01-11T22:10:24.4808542Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4808839Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4809085Z test_barrier_group_cuda (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4809291Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36932 2023-01-11T22:10:24.4809495Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36933 2023-01-11T22:10:24.4809843Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4810005Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4810371Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4810555Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4810902Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4811072Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4811443Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4811624Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4811864Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4812103Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4812503Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4812898Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4813123Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4813330Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4813485Z skip: Skipped due to small world size. (4.351s) 2023-01-11T22:10:24.4813505Z 2023-01-11T22:10:24.4813765Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4813875Z Ran 1 test in 4.351s 2023-01-11T22:10:24.4813894Z 2023-01-11T22:10:24.4813999Z OK (skipped=1) 2023-01-11T22:10:24.4814018Z 2023-01-11T22:10:24.4814139Z Generating XML reports... 2023-01-11T22:10:24.4814575Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215311.xml 2023-01-11T22:10:24.4814999Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4815179Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4815533Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4815720Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4815739Z 2023-01-11T22:10:24.4815844Z Running tests... 2023-01-11T22:10:24.4816097Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4816400Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4817000Z test_barrier_timeout_full_group (__main__.TestDistBackendWithSpawn) ... skip: Only gloo backend supports timeouts (0.002s) 2023-01-11T22:10:24.4817025Z 2023-01-11T22:10:24.4817295Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4817404Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4817427Z 2023-01-11T22:10:24.4817592Z OK (skipped=1) 2023-01-11T22:10:24.4817627Z 2023-01-11T22:10:24.4817740Z Generating XML reports... 2023-01-11T22:10:24.4818183Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215317.xml 2023-01-11T22:10:24.4818547Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4818722Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4819090Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4819273Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4819292Z 2023-01-11T22:10:24.4819401Z Running tests... 2023-01-11T22:10:24.4819658Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4819948Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4820212Z test_barrier_timeout_global (__main__.TestDistBackendWithSpawn) ... skip: Only gloo backend supports timeouts (0.002s) 2023-01-11T22:10:24.4820232Z 2023-01-11T22:10:24.4820486Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4820592Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4820611Z 2023-01-11T22:10:24.4820718Z OK (skipped=1) 2023-01-11T22:10:24.4820737Z 2023-01-11T22:10:24.4820855Z Generating XML reports... 2023-01-11T22:10:24.4821292Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215320.xml 2023-01-11T22:10:24.4821654Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4821830Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4822190Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4822375Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4822394Z 2023-01-11T22:10:24.4822499Z Running tests... 2023-01-11T22:10:24.4822753Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4823053Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4823319Z test_barrier_timeout_group (__main__.TestDistBackendWithSpawn) ... skip: Only gloo backend supports timeouts (0.002s) 2023-01-11T22:10:24.4823339Z 2023-01-11T22:10:24.4823597Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4823701Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4823720Z 2023-01-11T22:10:24.4823897Z OK (skipped=1) 2023-01-11T22:10:24.4823917Z 2023-01-11T22:10:24.4824022Z Generating XML reports... 2023-01-11T22:10:24.4824468Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215322.xml 2023-01-11T22:10:24.4824828Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4824993Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4825364Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4825550Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4825569Z 2023-01-11T22:10:24.4825676Z Running tests... 2023-01-11T22:10:24.4825933Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4826222Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4826476Z test_batch_isend_irecv_gloo (__main__.TestDistBackendWithSpawn) ... skip: GLOO Batch Send Recv CPU (0.002s) 2023-01-11T22:10:24.4826496Z 2023-01-11T22:10:24.4826791Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4826907Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4826927Z 2023-01-11T22:10:24.4827032Z OK (skipped=1) 2023-01-11T22:10:24.4827051Z 2023-01-11T22:10:24.4827174Z Generating XML reports... 2023-01-11T22:10:24.4827610Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215325.xml 2023-01-11T22:10:24.4827971Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4828143Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4828499Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4828689Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4828709Z 2023-01-11T22:10:24.4828813Z Running tests... 2023-01-11T22:10:24.4829069Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4829375Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4829634Z test_batch_isend_irecv_gloo_tags (__main__.TestDistBackendWithSpawn) ... skip: GLOO Batch Send Recv CPU (0.002s) 2023-01-11T22:10:24.4829653Z 2023-01-11T22:10:24.4829909Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4830016Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4830035Z 2023-01-11T22:10:24.4830140Z OK (skipped=1) 2023-01-11T22:10:24.4830159Z 2023-01-11T22:10:24.4830264Z Generating XML reports... 2023-01-11T22:10:24.4830698Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215327.xml 2023-01-11T22:10:24.4831062Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4831237Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4831610Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4831800Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4831819Z 2023-01-11T22:10:24.4831922Z Running tests... 2023-01-11T22:10:24.4832178Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4832467Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4832732Z test_batch_isend_irecv_mixed_backend_err (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.002s) 2023-01-11T22:10:24.4832752Z 2023-01-11T22:10:24.4833061Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4833168Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4833187Z 2023-01-11T22:10:24.4833294Z OK (skipped=1) 2023-01-11T22:10:24.4833312Z 2023-01-11T22:10:24.4833431Z Generating XML reports... 2023-01-11T22:10:24.4833864Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215329.xml 2023-01-11T22:10:24.4834226Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4834396Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4834750Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4834937Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4834956Z 2023-01-11T22:10:24.4835061Z Running tests... 2023-01-11T22:10:24.4835320Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4835620Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4835918Z test_batch_isend_irecv_nccl (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.003s) 2023-01-11T22:10:24.4835939Z 2023-01-11T22:10:24.4836201Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4836310Z Ran 1 test in 0.003s 2023-01-11T22:10:24.4836329Z 2023-01-11T22:10:24.4836437Z OK (skipped=1) 2023-01-11T22:10:24.4836456Z 2023-01-11T22:10:24.4836561Z Generating XML reports... 2023-01-11T22:10:24.4837000Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215332.xml 2023-01-11T22:10:24.4837366Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4837540Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4837909Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4838098Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4838118Z 2023-01-11T22:10:24.4838228Z Running tests... 2023-01-11T22:10:24.4838486Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4838789Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4839035Z test_batch_isend_irecv_no_rank_zero_nccl (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.002s) 2023-01-11T22:10:24.4839055Z 2023-01-11T22:10:24.4839303Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4839411Z Ran 1 test in 0.003s 2023-01-11T22:10:24.4839430Z 2023-01-11T22:10:24.4839533Z OK (skipped=1) 2023-01-11T22:10:24.4839556Z 2023-01-11T22:10:24.4839675Z Generating XML reports... 2023-01-11T22:10:24.4840111Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215334.xml 2023-01-11T22:10:24.4840476Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4840648Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4841000Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4841183Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4841203Z 2023-01-11T22:10:24.4841305Z Running tests... 2023-01-11T22:10:24.4841558Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4841862Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4842169Z test_batch_isend_irecv_op_err (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.002s) 2023-01-11T22:10:24.4842189Z 2023-01-11T22:10:24.4842451Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4842559Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4842579Z 2023-01-11T22:10:24.4842681Z OK (skipped=1) 2023-01-11T22:10:24.4842699Z 2023-01-11T22:10:24.4842805Z Generating XML reports... 2023-01-11T22:10:24.4843240Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215337.xml 2023-01-11T22:10:24.4843602Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4843776Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4844145Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4844332Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4844352Z 2023-01-11T22:10:24.4844458Z Running tests... 2023-01-11T22:10:24.4844759Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4845075Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4845319Z test_batch_isend_irecv_op_list_err (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.002s) 2023-01-11T22:10:24.4845338Z 2023-01-11T22:10:24.4845590Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4845699Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4845718Z 2023-01-11T22:10:24.4845825Z OK (skipped=1) 2023-01-11T22:10:24.4845844Z 2023-01-11T22:10:24.4845965Z Generating XML reports... 2023-01-11T22:10:24.4846398Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215339.xml 2023-01-11T22:10:24.4846765Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4846933Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4847307Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4847478Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4847497Z 2023-01-11T22:10:24.4847603Z Running tests... 2023-01-11T22:10:24.4847861Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4848164Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4848435Z test_batch_isend_irecv_ring_exchange_nccl (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.002s) 2023-01-11T22:10:24.4848455Z 2023-01-11T22:10:24.4848699Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4848811Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4848831Z 2023-01-11T22:10:24.4848933Z OK (skipped=1) 2023-01-11T22:10:24.4848952Z 2023-01-11T22:10:24.4849060Z Generating XML reports... 2023-01-11T22:10:24.4849491Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215341.xml 2023-01-11T22:10:24.4849847Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4850013Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4850381Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4850567Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4850586Z 2023-01-11T22:10:24.4850691Z Running tests... 2023-01-11T22:10:24.4850944Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4851306Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4851551Z test_batch_isend_irecv_self_nccl (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.002s) 2023-01-11T22:10:24.4851581Z 2023-01-11T22:10:24.4851819Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4851927Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4851946Z 2023-01-11T22:10:24.4852051Z OK (skipped=1) 2023-01-11T22:10:24.4852069Z 2023-01-11T22:10:24.4852187Z Generating XML reports... 2023-01-11T22:10:24.4852622Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215344.xml 2023-01-11T22:10:24.4852982Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4853153Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4853527Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4853745Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4853766Z 2023-01-11T22:10:24.4853873Z Running tests... 2023-01-11T22:10:24.4854131Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4854433Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4854690Z test_batch_isend_irecv_tensor_err (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.002s) 2023-01-11T22:10:24.4854709Z 2023-01-11T22:10:24.4854965Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4855072Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4855091Z 2023-01-11T22:10:24.4855193Z OK (skipped=1) 2023-01-11T22:10:24.4855213Z 2023-01-11T22:10:24.4855335Z Generating XML reports... 2023-01-11T22:10:24.4855752Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215346.xml 2023-01-11T22:10:24.4856113Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4856281Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4856891Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4857084Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4857104Z 2023-01-11T22:10:24.4857209Z Running tests... 2023-01-11T22:10:24.4857468Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4857772Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4857998Z test_broadcast (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4858220Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37464 2023-01-11T22:10:24.4858440Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37465 2023-01-11T22:10:24.4858808Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4858979Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4859345Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4859526Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4859887Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4860056Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4860507Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4860696Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4860942Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4861183Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4861573Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4861957Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4862184Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4862511Z STAGE:2023-01-11 21:53:52 37465:37465 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4862780Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4863195Z STAGE:2023-01-11 21:53:53 37464:37464 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4863481Z [1673474033.035023] [7c5487d9c02b:37465:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4863707Z [1673474034.697067] [7c5487d9c02b:37465:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4863940Z [1673474034.697067] [7c5487d9c02b:37465:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4864208Z [1673474033.014701] [7c5487d9c02b:37464:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4864436Z [1673474034.679436] [7c5487d9c02b:37464:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4864676Z [1673474034.679436] [7c5487d9c02b:37464:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4865225Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4865246Z 2023-01-11T22:10:24.4865589Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4865927Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4866231Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4866547Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4866880Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4867204Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4867762Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4867784Z 2023-01-11T22:10:24.4868098Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4868410Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4868728Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4869046Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4869449Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4869773Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4870088Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4870403Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4870727Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4871046Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4871377Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4871712Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4872032Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4872390Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4872705Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4873248Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4873268Z 2023-01-11T22:10:24.4873605Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4873918Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4874232Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4874553Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4874870Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4875202Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4875532Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4875848Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4876144Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4876465Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4876788Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4877121Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4877453Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4877763Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4878075Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4878390Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4878706Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4879025Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4879418Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4879735Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4880048Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4880366Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4880679Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4881008Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4881334Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4881648Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4881947Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4882315Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4882640Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4882971Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4883304Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4883619Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4883933Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4884257Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4884558Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4884893Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4885224Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4885530Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4885840Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4886156Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4886473Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4886806Z STAGE:2023-01-11 21:53:55 37464:37464 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4887136Z STAGE:2023-01-11 21:53:55 37465:37465 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4887220Z ok (6.781s) 2023-01-11T22:10:24.4887255Z 2023-01-11T22:10:24.4887501Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4887607Z Ran 1 test in 6.781s 2023-01-11T22:10:24.4887626Z 2023-01-11T22:10:24.4887717Z OK 2023-01-11T22:10:24.4887736Z 2023-01-11T22:10:24.4887858Z Generating XML reports... 2023-01-11T22:10:24.4888300Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215349.xml 2023-01-11T22:10:24.4888667Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4888836Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4889265Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4889441Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4889461Z 2023-01-11T22:10:24.4889565Z Running tests... 2023-01-11T22:10:24.4889822Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4890129Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4890404Z test_broadcast_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Gloo and Nccl backend supports CUDA allReduce (0.002s) 2023-01-11T22:10:24.4890423Z 2023-01-11T22:10:24.4890678Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4890787Z Ran 1 test in 0.002s 2023-01-11T22:10:24.4890806Z 2023-01-11T22:10:24.4890914Z OK (skipped=1) 2023-01-11T22:10:24.4890934Z 2023-01-11T22:10:24.4891059Z Generating XML reports... 2023-01-11T22:10:24.4891481Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215358.xml 2023-01-11T22:10:24.4891893Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4892075Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4892454Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4892640Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4892660Z 2023-01-11T22:10:24.4892760Z Running tests... 2023-01-11T22:10:24.4893010Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4893316Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4893557Z test_broadcast_full_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4893775Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37611 2023-01-11T22:10:24.4893991Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37612 2023-01-11T22:10:24.4894365Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4894541Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4894915Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4895101Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4895465Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4895637Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4895996Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4896185Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4896430Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4896983Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4897404Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4897797Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4898021Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4898255Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:10:24.4898601Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4898822Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:10:24.4899222Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.4899555Z STAGE:2023-01-11 21:54:04 37611:37611 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4899942Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.4900267Z STAGE:2023-01-11 21:54:04 37612:37612 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4900536Z [1673474044.778521] [7c5487d9c02b:37612:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4900767Z [1673474046.405057] [7c5487d9c02b:37612:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4901059Z [1673474046.405057] [7c5487d9c02b:37612:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4901334Z [1673474044.757497] [7c5487d9c02b:37611:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4901545Z [1673474046.420905] [7c5487d9c02b:37611:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4901775Z [1673474046.420905] [7c5487d9c02b:37611:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4902317Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4902343Z 2023-01-11T22:10:24.4902683Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4903021Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4903338Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4903647Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4903967Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4904282Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4904834Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4906341Z 2023-01-11T22:10:24.4906699Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4907010Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4907315Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4907916Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4907937Z 2023-01-11T22:10:24.4908280Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4908600Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4908991Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4909318Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4909642Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4909985Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4910323Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4910643Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4910943Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4911264Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4911590Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4911968Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4912308Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4912623Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4912933Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4913250Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4913551Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4913888Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4914224Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4914542Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4914860Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4915184Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4915503Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4915839Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4916171Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4916474Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4916788Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4917113Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4917432Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4917765Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4918099Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4918416Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4918731Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4919110Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4919414Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4919751Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4920084Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4920399Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4920708Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4921027Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4921342Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4921677Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4922049Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4922356Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4922668Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4922987Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4923524Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4923544Z 2023-01-11T22:10:24.4923877Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4924196Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4924514Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.4924836Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4925155Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.4925489Z STAGE:2023-01-11 21:54:06 37611:37611 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4925805Z STAGE:2023-01-11 21:54:06 37612:37612 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.4925905Z ok (6.758s) 2023-01-11T22:10:24.4925925Z 2023-01-11T22:10:24.4926188Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4926303Z Ran 1 test in 6.758s 2023-01-11T22:10:24.4926322Z 2023-01-11T22:10:24.4926412Z OK 2023-01-11T22:10:24.4926430Z 2023-01-11T22:10:24.4926553Z Generating XML reports... 2023-01-11T22:10:24.4926996Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215400.xml 2023-01-11T22:10:24.4927362Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4927537Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4927897Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4928080Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4928100Z 2023-01-11T22:10:24.4928206Z Running tests... 2023-01-11T22:10:24.4928461Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4928838Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4929096Z test_broadcast_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4929317Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37725 2023-01-11T22:10:24.4929535Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37726 2023-01-11T22:10:24.4929883Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4930056Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4930425Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4930610Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4930966Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4931136Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4931559Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4931751Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4931975Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4932215Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4932615Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4933005Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4933229Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4933452Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4933608Z skip: Skipped due to small world size. (4.253s) 2023-01-11T22:10:24.4933628Z 2023-01-11T22:10:24.4933878Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4933985Z Ran 1 test in 4.253s 2023-01-11T22:10:24.4934004Z 2023-01-11T22:10:24.4934094Z OK (skipped=1) 2023-01-11T22:10:24.4934130Z 2023-01-11T22:10:24.4934234Z Generating XML reports... 2023-01-11T22:10:24.4934674Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215410.xml 2023-01-11T22:10:24.4935038Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4935209Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4935584Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4935772Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4935794Z 2023-01-11T22:10:24.4935903Z Running tests... 2023-01-11T22:10:24.4936158Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4936447Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4937070Z test_broadcast_multigpu (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4937304Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37828 2023-01-11T22:10:24.4937521Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37829 2023-01-11T22:10:24.4937898Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4938168Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4938550Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4938741Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4939084Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4939257Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4939624Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4939811Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4940058Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4940300Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4940701Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4941148Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4941381Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4941589Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4942365Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1478: UserWarning: torch.distributed.broadcast_multigpu will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#multi-gpu-collective-functions 2023-01-11T22:10:24.4942477Z warnings.warn( 2023-01-11T22:10:24.4943234Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1478: UserWarning: torch.distributed.broadcast_multigpu will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#multi-gpu-collective-functions 2023-01-11T22:10:24.4943348Z warnings.warn( 2023-01-11T22:10:24.4943619Z [1673474062.188457] [7c5487d9c02b:37829:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4943844Z [1673474062.202065] [7c5487d9c02b:37829:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4944075Z [1673474062.202065] [7c5487d9c02b:37829:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4944339Z [1673474062.185988] [7c5487d9c02b:37828:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4944565Z [1673474062.199670] [7c5487d9c02b:37828:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4944799Z [1673474062.199670] [7c5487d9c02b:37828:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4944883Z ok (6.124s) 2023-01-11T22:10:24.4944903Z 2023-01-11T22:10:24.4945168Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4945276Z Ran 1 test in 6.124s 2023-01-11T22:10:24.4945296Z 2023-01-11T22:10:24.4945386Z OK 2023-01-11T22:10:24.4945405Z 2023-01-11T22:10:24.4945527Z Generating XML reports... 2023-01-11T22:10:24.4945963Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215416.xml 2023-01-11T22:10:24.4946325Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4946555Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4946933Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4947109Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4947129Z 2023-01-11T22:10:24.4947236Z Running tests... 2023-01-11T22:10:24.4947490Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4947792Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4948053Z test_broadcast_object_list (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4948789Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/82847 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.665s) 2023-01-11T22:10:24.4948814Z 2023-01-11T22:10:24.4949073Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4949184Z Ran 1 test in 1.665s 2023-01-11T22:10:24.4949250Z 2023-01-11T22:10:24.4949361Z OK (skipped=1) 2023-01-11T22:10:24.4949382Z 2023-01-11T22:10:24.4949487Z Generating XML reports... 2023-01-11T22:10:24.4949922Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215425.xml 2023-01-11T22:10:24.4950287Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4950463Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4950840Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4951030Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4951053Z 2023-01-11T22:10:24.4951160Z Running tests... 2023-01-11T22:10:24.4951419Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4951726Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4952018Z test_compute_bucket_assignment_by_size_sparse_error_with_logger (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4952748Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/85012 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.649s) 2023-01-11T22:10:24.4952769Z 2023-01-11T22:10:24.4953024Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4953137Z Ran 1 test in 1.649s 2023-01-11T22:10:24.4953161Z 2023-01-11T22:10:24.4953271Z OK (skipped=1) 2023-01-11T22:10:24.4953290Z 2023-01-11T22:10:24.4953414Z Generating XML reports... 2023-01-11T22:10:24.4953856Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215429.xml 2023-01-11T22:10:24.4954226Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4954400Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4954775Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4954947Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4954966Z 2023-01-11T22:10:24.4955072Z Running tests... 2023-01-11T22:10:24.4955328Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4955626Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4955996Z test_compute_bucket_assignment_by_size_sparse_error_without_logger (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4956734Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/85339 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.597s) 2023-01-11T22:10:24.4956756Z 2023-01-11T22:10:24.4957014Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4957122Z Ran 1 test in 1.597s 2023-01-11T22:10:24.4957141Z 2023-01-11T22:10:24.4957243Z OK (skipped=1) 2023-01-11T22:10:24.4957262Z 2023-01-11T22:10:24.4957369Z Generating XML reports... 2023-01-11T22:10:24.4957801Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215433.xml 2023-01-11T22:10:24.4958167Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4958383Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4958766Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4958952Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4958972Z 2023-01-11T22:10:24.4959078Z Running tests... 2023-01-11T22:10:24.4959331Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4959635Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4959889Z test_ddp_apply_optim_in_backward (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4960100Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38044 2023-01-11T22:10:24.4960322Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38045 2023-01-11T22:10:24.4960692Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4960865Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4961240Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4961423Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4961779Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4961932Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4962303Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4962492Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4962726Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4962970Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4963369Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4963755Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4963979Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4964199Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4964958Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:738: UserWarning: DDP + apply_optim_in_backward will currently set all parameter gradients to None. If this is not the desired behavior, please set env variable DDP_OVERLAPPED_OPTIM_SET_GRADS_TO_NONE=0, and manually setgradients to None/zero as desired. 2023-01-11T22:10:24.4965110Z warnings.warn( 2023-01-11T22:10:24.4965867Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:738: UserWarning: DDP + apply_optim_in_backward will currently set all parameter gradients to None. If this is not the desired behavior, please set env variable DDP_OVERLAPPED_OPTIM_SET_GRADS_TO_NONE=0, and manually setgradients to None/zero as desired. 2023-01-11T22:10:24.4965978Z warnings.warn( 2023-01-11T22:10:24.4966246Z [1673474083.375833] [7c5487d9c02b:38045:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4966480Z [1673474083.389172] [7c5487d9c02b:38045:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4966722Z [1673474083.389172] [7c5487d9c02b:38045:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4967037Z [1673474083.367891] [7c5487d9c02b:38044:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4967269Z [1673474083.381555] [7c5487d9c02b:38044:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4967497Z [1673474083.381555] [7c5487d9c02b:38044:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4967731Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4967946Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4968174Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4968409Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4968633Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4968856Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4969080Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4969306Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4969403Z ok (8.071s) 2023-01-11T22:10:24.4969423Z 2023-01-11T22:10:24.4969684Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4969778Z Ran 1 test in 8.071s 2023-01-11T22:10:24.4969798Z 2023-01-11T22:10:24.4969885Z OK 2023-01-11T22:10:24.4969904Z 2023-01-11T22:10:24.4970026Z Generating XML reports... 2023-01-11T22:10:24.4970466Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215438.xml 2023-01-11T22:10:24.4970836Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4971015Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4971390Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4971580Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4971600Z 2023-01-11T22:10:24.4971691Z Running tests... 2023-01-11T22:10:24.4971953Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4972257Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4972562Z test_ddp_apply_optim_in_backward_grad_as_bucket_view_false (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4972837Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38162 2023-01-11T22:10:24.4973046Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38163 2023-01-11T22:10:24.4973420Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4973594Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4973967Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4974140Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4974503Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4974675Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4975044Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4975234Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4975528Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4975776Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4976174Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4976774Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4977018Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4977243Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4978032Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:738: UserWarning: DDP + apply_optim_in_backward will currently set all parameter gradients to None. If this is not the desired behavior, please set env variable DDP_OVERLAPPED_OPTIM_SET_GRADS_TO_NONE=0, and manually setgradients to None/zero as desired. 2023-01-11T22:10:24.4978145Z warnings.warn( 2023-01-11T22:10:24.4978897Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:738: UserWarning: DDP + apply_optim_in_backward will currently set all parameter gradients to None. If this is not the desired behavior, please set env variable DDP_OVERLAPPED_OPTIM_SET_GRADS_TO_NONE=0, and manually setgradients to None/zero as desired. 2023-01-11T22:10:24.4979004Z warnings.warn( 2023-01-11T22:10:24.4979272Z [1673474093.907329] [7c5487d9c02b:38163:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4979500Z [1673474093.920482] [7c5487d9c02b:38163:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4979742Z [1673474093.920482] [7c5487d9c02b:38163:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4980005Z [1673474093.901918] [7c5487d9c02b:38162:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4980215Z [1673474093.915697] [7c5487d9c02b:38162:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4980446Z [1673474093.915697] [7c5487d9c02b:38162:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4980674Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4980904Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4981228Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4981457Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.4981560Z ok (7.150s) 2023-01-11T22:10:24.4981580Z 2023-01-11T22:10:24.4981851Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4981963Z Ran 1 test in 7.150s 2023-01-11T22:10:24.4981982Z 2023-01-11T22:10:24.4982057Z OK 2023-01-11T22:10:24.4982075Z 2023-01-11T22:10:24.4982192Z Generating XML reports... 2023-01-11T22:10:24.4982633Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215448.xml 2023-01-11T22:10:24.4982997Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4983173Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4983547Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4983741Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4983761Z 2023-01-11T22:10:24.4983930Z Running tests... 2023-01-11T22:10:24.4984184Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4984494Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4984785Z test_ddp_apply_optim_in_backward_ignored_params (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4985003Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38280 2023-01-11T22:10:24.4985219Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38281 2023-01-11T22:10:24.4985584Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4985762Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4986134Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4986328Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4986677Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4986850Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4987221Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4987407Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4987648Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4987889Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4988287Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4988684Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4988914Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.4989121Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.4989886Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:738: UserWarning: DDP + apply_optim_in_backward will currently set all parameter gradients to None. If this is not the desired behavior, please set env variable DDP_OVERLAPPED_OPTIM_SET_GRADS_TO_NONE=0, and manually setgradients to None/zero as desired. 2023-01-11T22:10:24.4990000Z warnings.warn( 2023-01-11T22:10:24.4990815Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:738: UserWarning: DDP + apply_optim_in_backward will currently set all parameter gradients to None. If this is not the desired behavior, please set env variable DDP_OVERLAPPED_OPTIM_SET_GRADS_TO_NONE=0, and manually setgradients to None/zero as desired. 2023-01-11T22:10:24.4990924Z warnings.warn( 2023-01-11T22:10:24.4991197Z [1673474103.716636] [7c5487d9c02b:38280:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4991428Z [1673474103.731066] [7c5487d9c02b:38280:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4991664Z [1673474103.731066] [7c5487d9c02b:38280:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4991926Z [1673474103.716613] [7c5487d9c02b:38281:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.4992151Z [1673474103.731150] [7c5487d9c02b:38281:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.4992436Z [1673474103.731150] [7c5487d9c02b:38281:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.4992528Z ok (7.949s) 2023-01-11T22:10:24.4992548Z 2023-01-11T22:10:24.4992812Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4992925Z Ran 1 test in 7.949s 2023-01-11T22:10:24.4992945Z 2023-01-11T22:10:24.4993038Z OK 2023-01-11T22:10:24.4993057Z 2023-01-11T22:10:24.4993182Z Generating XML reports... 2023-01-11T22:10:24.4993623Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215458.xml 2023-01-11T22:10:24.4993989Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4994166Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4994526Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4994713Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4994733Z 2023-01-11T22:10:24.4994839Z Running tests... 2023-01-11T22:10:24.4995099Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.4995404Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.4995663Z test_ddp_broadcast_buffer (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.4995884Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38400 2023-01-11T22:10:24.4996100Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38401 2023-01-11T22:10:24.4996466Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4996623Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4996999Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4997182Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4997546Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.4997713Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.4998084Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.4998269Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.4998509Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.4998794Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.4999194Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4999583Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.4999807Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5000029Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5000295Z [1673474114.238499] [7c5487d9c02b:38400:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5000523Z [1673474114.252220] [7c5487d9c02b:38400:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5000763Z [1673474114.252220] [7c5487d9c02b:38400:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5001043Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5001309Z [1673474114.245347] [7c5487d9c02b:38401:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5001519Z [1673474114.258734] [7c5487d9c02b:38401:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5001747Z [1673474114.258734] [7c5487d9c02b:38401:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5001978Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5002081Z ok (6.661s) 2023-01-11T22:10:24.5002104Z 2023-01-11T22:10:24.5002374Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5002483Z Ran 1 test in 6.661s 2023-01-11T22:10:24.5002504Z 2023-01-11T22:10:24.5002596Z OK 2023-01-11T22:10:24.5002619Z 2023-01-11T22:10:24.5002741Z Generating XML reports... 2023-01-11T22:10:24.5003174Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215508.xml 2023-01-11T22:10:24.5003522Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5003696Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5004065Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5004250Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5004270Z 2023-01-11T22:10:24.5004378Z Running tests... 2023-01-11T22:10:24.5004630Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5004931Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5005205Z test_ddp_broadcast_buffer_via_hook (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5005406Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38518 2023-01-11T22:10:24.5005619Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38519 2023-01-11T22:10:24.5005981Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5006152Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5006524Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5006705Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5007118Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5007289Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5007707Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5007880Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5008126Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5008369Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5008767Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5009158Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5009391Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5009669Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5009945Z [1673474123.341358] [7c5487d9c02b:38519:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5010173Z [1673474123.354897] [7c5487d9c02b:38519:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5010391Z [1673474123.354897] [7c5487d9c02b:38519:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5010663Z [1673474123.339650] [7c5487d9c02b:38518:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5010890Z [1673474123.353269] [7c5487d9c02b:38518:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5011125Z [1673474123.353269] [7c5487d9c02b:38518:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5011358Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5011590Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5011814Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5012043Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5012144Z ok (6.611s) 2023-01-11T22:10:24.5012164Z 2023-01-11T22:10:24.5012418Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5012529Z Ran 1 test in 6.611s 2023-01-11T22:10:24.5012552Z 2023-01-11T22:10:24.5012642Z OK 2023-01-11T22:10:24.5012662Z 2023-01-11T22:10:24.5012786Z Generating XML reports... 2023-01-11T22:10:24.5013232Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215518.xml 2023-01-11T22:10:24.5013602Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5013773Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5014151Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5014339Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5014360Z 2023-01-11T22:10:24.5014452Z Running tests... 2023-01-11T22:10:24.5014715Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5015015Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5015341Z test_ddp_buffer_hook_allreduce (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5016088Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78641 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.579s) 2023-01-11T22:10:24.5016110Z 2023-01-11T22:10:24.5016370Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5016480Z Ran 1 test in 1.579s 2023-01-11T22:10:24.5016500Z 2023-01-11T22:10:24.5016919Z OK (skipped=1) 2023-01-11T22:10:24.5016943Z 2023-01-11T22:10:24.5017074Z Generating XML reports... 2023-01-11T22:10:24.5017508Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215527.xml 2023-01-11T22:10:24.5017882Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5018053Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5018505Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5018700Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5018720Z 2023-01-11T22:10:24.5018822Z Running tests... 2023-01-11T22:10:24.5019077Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5019379Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5019666Z test_ddp_buffer_hook_allreduce_return_future (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5020386Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77261 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.638s) 2023-01-11T22:10:24.5020428Z 2023-01-11T22:10:24.5020668Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5020772Z Ran 1 test in 1.638s 2023-01-11T22:10:24.5020792Z 2023-01-11T22:10:24.5020897Z OK (skipped=1) 2023-01-11T22:10:24.5020916Z 2023-01-11T22:10:24.5021032Z Generating XML reports... 2023-01-11T22:10:24.5021467Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215531.xml 2023-01-11T22:10:24.5021834Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5022008Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5022377Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5022569Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5022589Z 2023-01-11T22:10:24.5022683Z Running tests... 2023-01-11T22:10:24.5022941Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5023245Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5023528Z test_ddp_build_debug_param_to_name_mapping (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5023749Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38704 2023-01-11T22:10:24.5023964Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38705 2023-01-11T22:10:24.5024329Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5024573Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5024933Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5025121Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5025478Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5025649Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5026019Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5026198Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5026434Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5026669Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5027068Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5027488Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5027721Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5027946Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5028150Z 2023-01-11T22:10:24.5028420Z [1673474140.824187] [7c5487d9c02b:38704:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5028645Z [1673474140.837923] [7c5487d9c02b:38704:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5028884Z [1673474140.837923] [7c5487d9c02b:38704:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5029153Z [1673474140.827346] [7c5487d9c02b:38705:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5029380Z [1673474140.840445] [7c5487d9c02b:38705:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5029612Z [1673474140.840445] [7c5487d9c02b:38705:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5029699Z ok (6.157s) 2023-01-11T22:10:24.5029719Z 2023-01-11T22:10:24.5029979Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5030088Z Ran 1 test in 6.157s 2023-01-11T22:10:24.5030107Z 2023-01-11T22:10:24.5030196Z OK 2023-01-11T22:10:24.5030215Z 2023-01-11T22:10:24.5030340Z Generating XML reports... 2023-01-11T22:10:24.5030780Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215535.xml 2023-01-11T22:10:24.5031149Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5031323Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5031678Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5031862Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5031881Z 2023-01-11T22:10:24.5031986Z Running tests... 2023-01-11T22:10:24.5032241Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5032543Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5032847Z test_ddp_build_debug_param_to_name_mapping_requires_grad (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5033120Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38818 2023-01-11T22:10:24.5033331Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38819 2023-01-11T22:10:24.5033696Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5033852Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5034222Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5034406Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5034767Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5034932Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5035303Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5035568Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5035814Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5036039Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5036435Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5036820Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5037044Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5037267Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5037541Z [1673474149.504277] [7c5487d9c02b:38819:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5037774Z [1673474149.517589] [7c5487d9c02b:38819:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5038011Z [1673474149.517589] [7c5487d9c02b:38819:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5038283Z [1673474149.495687] [7c5487d9c02b:38818:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5038504Z [1673474149.509413] [7c5487d9c02b:38818:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5038720Z [1673474149.509413] [7c5487d9c02b:38818:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5038821Z ok (6.116s) 2023-01-11T22:10:24.5038841Z 2023-01-11T22:10:24.5039103Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5039215Z Ran 1 test in 6.116s 2023-01-11T22:10:24.5039236Z 2023-01-11T22:10:24.5039325Z OK 2023-01-11T22:10:24.5039345Z 2023-01-11T22:10:24.5039465Z Generating XML reports... 2023-01-11T22:10:24.5039896Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215544.xml 2023-01-11T22:10:24.5040260Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5040431Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5040790Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5040981Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5041052Z 2023-01-11T22:10:24.5041160Z Running tests... 2023-01-11T22:10:24.5041425Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5041740Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5042003Z test_ddp_comm_hook_logging (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5042222Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38932 2023-01-11T22:10:24.5042437Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38933 2023-01-11T22:10:24.5042786Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5042955Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5043322Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5043514Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5043927Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5044106Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5044476Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5044663Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5044904Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5045128Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5045522Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5045906Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5046132Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5046355Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5046587Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5046816Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5047082Z [1673474158.182477] [7c5487d9c02b:38933:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5047309Z [1673474158.195547] [7c5487d9c02b:38933:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5047530Z [1673474158.195547] [7c5487d9c02b:38933:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5047798Z [1673474158.176803] [7c5487d9c02b:38932:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5048024Z [1673474158.190521] [7c5487d9c02b:38932:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5048253Z [1673474158.190521] [7c5487d9c02b:38932:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5048351Z ok (6.619s) 2023-01-11T22:10:24.5048372Z 2023-01-11T22:10:24.5048635Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5048745Z Ran 1 test in 6.620s 2023-01-11T22:10:24.5048765Z 2023-01-11T22:10:24.5048858Z OK 2023-01-11T22:10:24.5048877Z 2023-01-11T22:10:24.5049005Z Generating XML reports... 2023-01-11T22:10:24.5049489Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215552.xml 2023-01-11T22:10:24.5049858Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5050033Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5050407Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5050596Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5050616Z 2023-01-11T22:10:24.5050726Z Running tests... 2023-01-11T22:10:24.5050986Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5051296Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5051582Z test_ddp_control_flow_different_across_ranks (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5051785Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39050 2023-01-11T22:10:24.5052060Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39051 2023-01-11T22:10:24.5052439Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5052614Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5052988Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5053177Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5053540Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5053712Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5054073Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5054258Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5054504Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5054743Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5055135Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5055523Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5055746Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5055966Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5056232Z [1673474167.340632] [7c5487d9c02b:39050:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5056447Z [1673474167.354446] [7c5487d9c02b:39050:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5056927Z [1673474167.354446] [7c5487d9c02b:39050:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5057203Z [1673474167.340609] [7c5487d9c02b:39051:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5057421Z [1673474167.354470] [7c5487d9c02b:39051:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5057651Z [1673474167.354470] [7c5487d9c02b:39051:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5058512Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:10:24.5058614Z ok (6.563s) 2023-01-11T22:10:24.5058635Z 2023-01-11T22:10:24.5058910Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5059026Z Ran 1 test in 6.563s 2023-01-11T22:10:24.5059045Z 2023-01-11T22:10:24.5059138Z OK 2023-01-11T22:10:24.5059157Z 2023-01-11T22:10:24.5059283Z Generating XML reports... 2023-01-11T22:10:24.5059709Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215602.xml 2023-01-11T22:10:24.5060077Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5060314Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5060704Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5060892Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5060912Z 2023-01-11T22:10:24.5061021Z Running tests... 2023-01-11T22:10:24.5061274Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5061579Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5061855Z test_ddp_control_flow_same_across_ranks (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5062578Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78235 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.628s) 2023-01-11T22:10:24.5062620Z 2023-01-11T22:10:24.5062864Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5062975Z Ran 1 test in 1.628s 2023-01-11T22:10:24.5062995Z 2023-01-11T22:10:24.5063100Z OK (skipped=1) 2023-01-11T22:10:24.5063120Z 2023-01-11T22:10:24.5063241Z Generating XML reports... 2023-01-11T22:10:24.5063684Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215611.xml 2023-01-11T22:10:24.5064048Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5064223Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5064598Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5064783Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5064804Z 2023-01-11T22:10:24.5064895Z Running tests... 2023-01-11T22:10:24.5065156Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5065463Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5065719Z test_ddp_create_graph (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5065928Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39202 2023-01-11T22:10:24.5066135Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39203 2023-01-11T22:10:24.5066495Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5066727Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5067090Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5067279Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5067643Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5067816Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5068192Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5068383Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5068623Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5068864Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5069260Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5069685Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5069919Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5070141Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5070411Z [1673474179.266687] [7c5487d9c02b:39202:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5070635Z [1673474180.707056] [7c5487d9c02b:39202:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5070870Z [1673474180.707056] [7c5487d9c02b:39202:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5071764Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T22:10:24.5072034Z [1673474179.286571] [7c5487d9c02b:39203:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5072262Z [1673474180.737147] [7c5487d9c02b:39203:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5072497Z [1673474180.737147] [7c5487d9c02b:39203:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5073377Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T22:10:24.5074522Z /opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py:197: UserWarning: Using backward() with create_graph=True will create a reference cycle between the parameter and its gradient which can cause a memory leak. We recommend using autograd.grad when creating the graph to avoid this. If you have to use this function, make sure to reset the .grad fields of your parameters to None after use to break the cycle and avoid the leak. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/engine.cpp:1134.) 2023-01-11T22:10:24.5074748Z Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2023-01-11T22:10:24.5075945Z /opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py:197: UserWarning: Using backward() with create_graph=True will create a reference cycle between the parameter and its gradient which can cause a memory leak. We recommend using autograd.grad when creating the graph to avoid this. If you have to use this function, make sure to reset the .grad fields of your parameters to None after use to break the cycle and avoid the leak. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/engine.cpp:1134.) 2023-01-11T22:10:24.5076167Z Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2023-01-11T22:10:24.5076401Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5076619Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5077543Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T22:10:24.5078420Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T22:10:24.5079280Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T22:10:24.5080142Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T22:10:24.5080989Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T22:10:24.5081842Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T22:10:24.5082695Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T22:10:24.5083539Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T22:10:24.5084445Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T22:10:24.5085299Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T22:10:24.5085386Z ok (6.265s) 2023-01-11T22:10:24.5085406Z 2023-01-11T22:10:24.5085666Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5085779Z Ran 1 test in 6.265s 2023-01-11T22:10:24.5085798Z 2023-01-11T22:10:24.5085888Z OK 2023-01-11T22:10:24.5085908Z 2023-01-11T22:10:24.5086034Z Generating XML reports... 2023-01-11T22:10:24.5086525Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215615.xml 2023-01-11T22:10:24.5086899Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5087073Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5087452Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5087626Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5087646Z 2023-01-11T22:10:24.5087752Z Running tests... 2023-01-11T22:10:24.5088005Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5088318Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5088563Z test_ddp_device (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5089299Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77324 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.594s) 2023-01-11T22:10:24.5089320Z 2023-01-11T22:10:24.5089578Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5089687Z Ran 1 test in 1.595s 2023-01-11T22:10:24.5089706Z 2023-01-11T22:10:24.5089807Z OK (skipped=1) 2023-01-11T22:10:24.5089827Z 2023-01-11T22:10:24.5089933Z Generating XML reports... 2023-01-11T22:10:24.5090365Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215624.xml 2023-01-11T22:10:24.5090731Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5090906Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5091283Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5091466Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5091486Z 2023-01-11T22:10:24.5091593Z Running tests... 2023-01-11T22:10:24.5091852Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5092162Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5092413Z test_ddp_forward_backward_hook (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5092634Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39350 2023-01-11T22:10:24.5092907Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39351 2023-01-11T22:10:24.5093281Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5093454Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5093826Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5094012Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5094372Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5094529Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5094904Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5095099Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5095338Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5095622Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5096026Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5096419Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5096878Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5097113Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5097904Z /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1331: UserWarning: Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior. 2023-01-11T22:10:24.5098221Z warnings.warn("Using a non-full backward hook when the forward contains multiple autograd Nodes " 2023-01-11T22:10:24.5098991Z /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1331: UserWarning: Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior. 2023-01-11T22:10:24.5099312Z warnings.warn("Using a non-full backward hook when the forward contains multiple autograd Nodes " 2023-01-11T22:10:24.5099583Z [1673474193.464103] [7c5487d9c02b:39350:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5099814Z [1673474193.477928] [7c5487d9c02b:39350:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5100050Z [1673474193.477928] [7c5487d9c02b:39350:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5100311Z [1673474193.468790] [7c5487d9c02b:39351:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5100529Z [1673474193.482195] [7c5487d9c02b:39351:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5100757Z [1673474193.482195] [7c5487d9c02b:39351:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5100856Z ok (6.646s) 2023-01-11T22:10:24.5100876Z 2023-01-11T22:10:24.5101140Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5101327Z Ran 1 test in 6.647s 2023-01-11T22:10:24.5101347Z 2023-01-11T22:10:24.5101432Z OK 2023-01-11T22:10:24.5101451Z 2023-01-11T22:10:24.5101568Z Generating XML reports... 2023-01-11T22:10:24.5102020Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215628.xml 2023-01-11T22:10:24.5102387Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5102566Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5102940Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5103151Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5103171Z 2023-01-11T22:10:24.5103261Z Running tests... 2023-01-11T22:10:24.5103555Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5103927Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5104311Z test_ddp_grad_div_uneven_inputs (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5105115Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78685 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.618s) 2023-01-11T22:10:24.5105136Z 2023-01-11T22:10:24.5105434Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5105599Z Ran 1 test in 1.619s 2023-01-11T22:10:24.5105619Z 2023-01-11T22:10:24.5105764Z OK (skipped=1) 2023-01-11T22:10:24.5105783Z 2023-01-11T22:10:24.5105906Z Generating XML reports... 2023-01-11T22:10:24.5106385Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215637.xml 2023-01-11T22:10:24.5106789Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5107020Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5107448Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5107684Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5107705Z 2023-01-11T22:10:24.5107817Z Running tests... 2023-01-11T22:10:24.5108080Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5108391Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5108664Z test_ddp_hook_parity_allreduce (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5109403Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77293 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.622s) 2023-01-11T22:10:24.5109427Z 2023-01-11T22:10:24.5109684Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5109780Z Ran 1 test in 1.622s 2023-01-11T22:10:24.5109799Z 2023-01-11T22:10:24.5109908Z OK (skipped=1) 2023-01-11T22:10:24.5109927Z 2023-01-11T22:10:24.5110048Z Generating XML reports... 2023-01-11T22:10:24.5110488Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215641.xml 2023-01-11T22:10:24.5110853Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5111023Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5111461Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5111644Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5111664Z 2023-01-11T22:10:24.5111756Z Running tests... 2023-01-11T22:10:24.5112011Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5112316Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5112598Z test_ddp_hook_parity_allreduce_process_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5112814Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39566 2023-01-11T22:10:24.5113026Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39567 2023-01-11T22:10:24.5113392Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5113563Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5113999Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5114178Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5114544Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5114712Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5115087Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5115273Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5115512Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5115755Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5116154Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5116531Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5116758Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5116992Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:10:24.5117212Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5117441Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:10:24.5117828Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.5118213Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.5118489Z [1673474211.000168] [7c5487d9c02b:39566:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5118720Z [1673474211.013706] [7c5487d9c02b:39566:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5118955Z [1673474211.013706] [7c5487d9c02b:39566:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5119204Z [1673474211.006485] [7c5487d9c02b:39567:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5119430Z [1673474211.019177] [7c5487d9c02b:39567:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5119720Z [1673474211.019177] [7c5487d9c02b:39567:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5119956Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5120189Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5120417Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5120646Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5120748Z ok (6.848s) 2023-01-11T22:10:24.5120768Z 2023-01-11T22:10:24.5121039Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5121133Z Ran 1 test in 6.848s 2023-01-11T22:10:24.5121153Z 2023-01-11T22:10:24.5121241Z OK 2023-01-11T22:10:24.5121261Z 2023-01-11T22:10:24.5121380Z Generating XML reports... 2023-01-11T22:10:24.5121823Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215645.xml 2023-01-11T22:10:24.5122235Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5122416Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5122792Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5122980Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5122999Z 2023-01-11T22:10:24.5123106Z Running tests... 2023-01-11T22:10:24.5123350Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5123658Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5123930Z test_ddp_hook_parity_post_localSGD (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5124149Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39684 2023-01-11T22:10:24.5124368Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39685 2023-01-11T22:10:24.5124738Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5124914Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5125286Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5125457Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5125819Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5125992Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5126372Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5126557Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5126804Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5127046Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5127442Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5127830Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5128040Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5128312Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Local SGD will be started after 10 iterations 2023-01-11T22:10:24.5128590Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5128861Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Local SGD will be started after 10 iterations 2023-01-11T22:10:24.5129133Z [1673474220.263603] [7c5487d9c02b:39684:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5129361Z [1673474220.276941] [7c5487d9c02b:39684:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5129595Z [1673474220.276941] [7c5487d9c02b:39684:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5129864Z [1673474220.272630] [7c5487d9c02b:39685:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5130092Z [1673474220.285986] [7c5487d9c02b:39685:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5130365Z [1673474220.285986] [7c5487d9c02b:39685:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5130591Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5130822Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5131054Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5131281Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5131552Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Start to apply local SGD after 10 iterations. 2023-01-11T22:10:24.5131825Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Start to apply local SGD after 10 iterations. 2023-01-11T22:10:24.5132097Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Local SGD will be started after 10 iterations 2023-01-11T22:10:24.5132365Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Local SGD will be started after 10 iterations 2023-01-11T22:10:24.5132595Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5132809Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5133038Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5133267Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5133531Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Start to apply local SGD after 10 iterations. 2023-01-11T22:10:24.5133797Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Start to apply local SGD after 10 iterations. 2023-01-11T22:10:24.5134073Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Local SGD will be started after 1000 iterations 2023-01-11T22:10:24.5134340Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Local SGD will be started after 1000 iterations 2023-01-11T22:10:24.5134566Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5134777Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5135007Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5135235Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5135336Z ok (7.223s) 2023-01-11T22:10:24.5135356Z 2023-01-11T22:10:24.5135624Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5135737Z Ran 1 test in 7.223s 2023-01-11T22:10:24.5135809Z 2023-01-11T22:10:24.5135902Z OK 2023-01-11T22:10:24.5135921Z 2023-01-11T22:10:24.5136046Z Generating XML reports... 2023-01-11T22:10:24.5136492Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215655.xml 2023-01-11T22:10:24.5137090Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5137265Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5137642Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5137824Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5137845Z 2023-01-11T22:10:24.5137952Z Running tests... 2023-01-11T22:10:24.5138214Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5138516Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5138784Z test_ddp_hook_parity_powerSGD (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5139596Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77378 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.631s) 2023-01-11T22:10:24.5139621Z 2023-01-11T22:10:24.5139887Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5139982Z Ran 1 test in 1.631s 2023-01-11T22:10:24.5140003Z 2023-01-11T22:10:24.5140106Z OK (skipped=1) 2023-01-11T22:10:24.5140125Z 2023-01-11T22:10:24.5140242Z Generating XML reports... 2023-01-11T22:10:24.5140681Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215704.xml 2023-01-11T22:10:24.5141054Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5141229Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5141605Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5141794Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5141814Z 2023-01-11T22:10:24.5141905Z Running tests... 2023-01-11T22:10:24.5142165Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5142472Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5142739Z test_ddp_hook_pickling_powerSGD (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5142956Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39836 2023-01-11T22:10:24.5143177Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39837 2023-01-11T22:10:24.5143547Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5143723Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5144079Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5144263Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5144620Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5144792Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5145162Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5145420Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5145663Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5145907Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5146302Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5146675Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5146901Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5147434Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 4; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T22:10:24.5147660Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5148234Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 4; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T22:10:24.5148472Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5148701Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5148971Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:Start to apply PowerSGD after 4 iterations. 2023-01-11T22:10:24.5149240Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:Start to apply PowerSGD after 4 iterations. 2023-01-11T22:10:24.5149537Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:A zero tensor of length 10 that represents local error is created. 2023-01-11T22:10:24.5149832Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:A zero tensor of length 10 that represents local error is created. 2023-01-11T22:10:24.5150137Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:Compression stats: iter 4, total before compression 10, total after compression 10, rate 1.0 2023-01-11T22:10:24.5150448Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:Allocating contiguous memory of length 0 for Ps, and of length 0 for Qs, respectively. 2023-01-11T22:10:24.5150765Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:Compression stats: iter 4, total before compression 10, total after compression 10, rate 1.0 2023-01-11T22:10:24.5151081Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:Allocating contiguous memory of length 0 for Ps, and of length 0 for Qs, respectively. 2023-01-11T22:10:24.5151358Z [1673474234.342641] [7c5487d9c02b:39836:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5151590Z [1673474234.356471] [7c5487d9c02b:39836:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5151823Z [1673474234.356471] [7c5487d9c02b:39836:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5152088Z [1673474234.348106] [7c5487d9c02b:39837:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5152314Z [1673474234.361728] [7c5487d9c02b:39837:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5152541Z [1673474234.361728] [7c5487d9c02b:39837:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5152829Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5153048Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5153145Z ok (6.734s) 2023-01-11T22:10:24.5153164Z 2023-01-11T22:10:24.5153431Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5153541Z Ran 1 test in 6.734s 2023-01-11T22:10:24.5153561Z 2023-01-11T22:10:24.5153651Z OK 2023-01-11T22:10:24.5153670Z 2023-01-11T22:10:24.5153787Z Generating XML reports... 2023-01-11T22:10:24.5154233Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215709.xml 2023-01-11T22:10:24.5154597Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5154771Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5155129Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5155360Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5155381Z 2023-01-11T22:10:24.5155486Z Running tests... 2023-01-11T22:10:24.5155741Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5156044Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5156429Z test_ddp_hook_with_optimizer_parity_adam_optimize_subset_False (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T22:10:24.5156450Z 2023-01-11T22:10:24.5156701Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5156806Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5156828Z 2023-01-11T22:10:24.5156935Z OK (skipped=1) 2023-01-11T22:10:24.5156955Z 2023-01-11T22:10:24.5157061Z Generating XML reports... 2023-01-11T22:10:24.5157500Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215718.xml 2023-01-11T22:10:24.5157864Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5158038Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5158414Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5158601Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5158621Z 2023-01-11T22:10:24.5158723Z Running tests... 2023-01-11T22:10:24.5158979Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5159268Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5159655Z test_ddp_hook_with_optimizer_parity_adam_optimize_subset_True (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T22:10:24.5159676Z 2023-01-11T22:10:24.5159935Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5160041Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5160060Z 2023-01-11T22:10:24.5160165Z OK (skipped=1) 2023-01-11T22:10:24.5160184Z 2023-01-11T22:10:24.5160303Z Generating XML reports... 2023-01-11T22:10:24.5160738Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215720.xml 2023-01-11T22:10:24.5161104Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5161278Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5161697Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5161885Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5161904Z 2023-01-11T22:10:24.5162010Z Running tests... 2023-01-11T22:10:24.5162267Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5162573Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5163077Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_False_optimize_subset_False (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T22:10:24.5163099Z 2023-01-11T22:10:24.5163359Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5163469Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5163492Z 2023-01-11T22:10:24.5163597Z OK (skipped=1) 2023-01-11T22:10:24.5163616Z 2023-01-11T22:10:24.5163738Z Generating XML reports... 2023-01-11T22:10:24.5164210Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215723.xml 2023-01-11T22:10:24.5164580Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5164750Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5165123Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5165310Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5165329Z 2023-01-11T22:10:24.5165432Z Running tests... 2023-01-11T22:10:24.5165689Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5165994Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5166440Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_False_optimize_subset_True (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T22:10:24.5166461Z 2023-01-11T22:10:24.5166704Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5166815Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5166834Z 2023-01-11T22:10:24.5166939Z OK (skipped=1) 2023-01-11T22:10:24.5166958Z 2023-01-11T22:10:24.5167080Z Generating XML reports... 2023-01-11T22:10:24.5167516Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215725.xml 2023-01-11T22:10:24.5167878Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5168051Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5168427Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5168612Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5168632Z 2023-01-11T22:10:24.5168723Z Running tests... 2023-01-11T22:10:24.5168980Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5169283Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5169720Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_True_optimize_subset_False (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T22:10:24.5169740Z 2023-01-11T22:10:24.5169995Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5170190Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5170210Z 2023-01-11T22:10:24.5170311Z OK (skipped=1) 2023-01-11T22:10:24.5170330Z 2023-01-11T22:10:24.5170450Z Generating XML reports... 2023-01-11T22:10:24.5170894Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215727.xml 2023-01-11T22:10:24.5171244Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5171412Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5171780Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5171967Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5171986Z 2023-01-11T22:10:24.5172090Z Running tests... 2023-01-11T22:10:24.5172341Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5172644Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5173138Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_True_optimize_subset_True (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T22:10:24.5173160Z 2023-01-11T22:10:24.5173427Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5173521Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5173554Z 2023-01-11T22:10:24.5173644Z OK (skipped=1) 2023-01-11T22:10:24.5173663Z 2023-01-11T22:10:24.5173785Z Generating XML reports... 2023-01-11T22:10:24.5174222Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215730.xml 2023-01-11T22:10:24.5174586Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5174763Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5175140Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5175328Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5175347Z 2023-01-11T22:10:24.5175456Z Running tests... 2023-01-11T22:10:24.5175698Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5176004Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5176441Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_False_optimize_subset_False (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T22:10:24.5176461Z 2023-01-11T22:10:24.5176958Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5177072Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5177093Z 2023-01-11T22:10:24.5177203Z OK (skipped=1) 2023-01-11T22:10:24.5177223Z 2023-01-11T22:10:24.5177351Z Generating XML reports... 2023-01-11T22:10:24.5177790Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215732.xml 2023-01-11T22:10:24.5178157Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5178315Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5178688Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5178873Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5178893Z 2023-01-11T22:10:24.5178998Z Running tests... 2023-01-11T22:10:24.5179254Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5179659Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5180099Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_False_optimize_subset_True (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T22:10:24.5180120Z 2023-01-11T22:10:24.5180373Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5180480Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5180500Z 2023-01-11T22:10:24.5180608Z OK (skipped=1) 2023-01-11T22:10:24.5180628Z 2023-01-11T22:10:24.5180732Z Generating XML reports... 2023-01-11T22:10:24.5181169Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215735.xml 2023-01-11T22:10:24.5181536Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5181712Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5182151Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5182349Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5182369Z 2023-01-11T22:10:24.5182477Z Running tests... 2023-01-11T22:10:24.5182741Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5183031Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5183469Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_True_optimize_subset_False (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T22:10:24.5183489Z 2023-01-11T22:10:24.5183752Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5183863Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5183883Z 2023-01-11T22:10:24.5183986Z OK (skipped=1) 2023-01-11T22:10:24.5184006Z 2023-01-11T22:10:24.5184126Z Generating XML reports... 2023-01-11T22:10:24.5184558Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215737.xml 2023-01-11T22:10:24.5184917Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5185092Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5185466Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5185638Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5185658Z 2023-01-11T22:10:24.5185765Z Running tests... 2023-01-11T22:10:24.5186026Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5186325Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5186754Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_True_optimize_subset_True (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T22:10:24.5186776Z 2023-01-11T22:10:24.5187032Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5187140Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5187159Z 2023-01-11T22:10:24.5187262Z OK (skipped=1) 2023-01-11T22:10:24.5187281Z 2023-01-11T22:10:24.5187403Z Generating XML reports... 2023-01-11T22:10:24.5187825Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215739.xml 2023-01-11T22:10:24.5188190Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5188425Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5188805Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5188991Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5189011Z 2023-01-11T22:10:24.5189117Z Running tests... 2023-01-11T22:10:24.5189373Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5189676Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5190055Z test_ddp_hook_with_optimizer_parity_sgd_optimize_subset_False (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T22:10:24.5190076Z 2023-01-11T22:10:24.5190316Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5190421Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5190441Z 2023-01-11T22:10:24.5190548Z OK (skipped=1) 2023-01-11T22:10:24.5190567Z 2023-01-11T22:10:24.5190733Z Generating XML reports... 2023-01-11T22:10:24.5191176Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215742.xml 2023-01-11T22:10:24.5191533Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5191705Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5192074Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5192255Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5192275Z 2023-01-11T22:10:24.5192367Z Running tests... 2023-01-11T22:10:24.5192625Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5192927Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5193307Z test_ddp_hook_with_optimizer_parity_sgd_optimize_subset_True (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T22:10:24.5193327Z 2023-01-11T22:10:24.5193582Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5193695Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5193714Z 2023-01-11T22:10:24.5193820Z OK (skipped=1) 2023-01-11T22:10:24.5193839Z 2023-01-11T22:10:24.5193957Z Generating XML reports... 2023-01-11T22:10:24.5194387Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215744.xml 2023-01-11T22:10:24.5194732Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5194905Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5195277Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5195460Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5195479Z 2023-01-11T22:10:24.5195588Z Running tests... 2023-01-11T22:10:24.5195845Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5196147Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5196407Z test_ddp_ignore_params_arg (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5197143Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77325 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.660s) 2023-01-11T22:10:24.5197217Z 2023-01-11T22:10:24.5197488Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5197584Z Ran 1 test in 1.660s 2023-01-11T22:10:24.5197603Z 2023-01-11T22:10:24.5197708Z OK (skipped=1) 2023-01-11T22:10:24.5197727Z 2023-01-11T22:10:24.5197847Z Generating XML reports... 2023-01-11T22:10:24.5198289Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215747.xml 2023-01-11T22:10:24.5198656Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5198828Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5199198Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5199386Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5199405Z 2023-01-11T22:10:24.5199497Z Running tests... 2023-01-11T22:10:24.5199808Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5200122Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5200373Z test_ddp_inference (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5200591Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40384 2023-01-11T22:10:24.5200805Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40385 2023-01-11T22:10:24.5201169Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5201340Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5201714Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5201889Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5202254Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5202428Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5202803Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5202984Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5203217Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5203453Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5203845Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5204223Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5204450Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5204668Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5204934Z [1673474276.514059] [7c5487d9c02b:40384:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5205162Z [1673474276.527390] [7c5487d9c02b:40384:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5205396Z [1673474276.527390] [7c5487d9c02b:40384:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5205664Z [1673474276.514874] [7c5487d9c02b:40385:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5205956Z [1673474276.528362] [7c5487d9c02b:40385:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5206189Z [1673474276.528362] [7c5487d9c02b:40385:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5206287Z ok (6.863s) 2023-01-11T22:10:24.5206307Z 2023-01-11T22:10:24.5206556Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5206659Z Ran 1 test in 6.863s 2023-01-11T22:10:24.5206678Z 2023-01-11T22:10:24.5206766Z OK 2023-01-11T22:10:24.5206786Z 2023-01-11T22:10:24.5206906Z Generating XML reports... 2023-01-11T22:10:24.5207344Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215751.xml 2023-01-11T22:10:24.5207763Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5207946Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5208376Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5208568Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5208589Z 2023-01-11T22:10:24.5208680Z Running tests... 2023-01-11T22:10:24.5208939Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5209239Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5209503Z test_ddp_join_model_equivalence (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5209717Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40498 2023-01-11T22:10:24.5209930Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40499 2023-01-11T22:10:24.5210296Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5210472Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5210829Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5211018Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5211378Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5211550Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5211921Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5212102Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5212345Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5212589Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5212983Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5213356Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5213583Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5213805Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5214033Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5214261Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5214589Z [1673474286.390964] [7c5487d9c02b:40499:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5214826Z [1673474286.404261] [7c5487d9c02b:40499:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5215063Z [1673474286.404261] [7c5487d9c02b:40499:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5215331Z [1673474286.390787] [7c5487d9c02b:40498:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5215539Z [1673474286.404286] [7c5487d9c02b:40498:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5215773Z [1673474286.404286] [7c5487d9c02b:40498:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5216188Z /opt/conda/lib/python3.10/tempfile.py:860: ResourceWarning: Implicitly cleaning up 2023-01-11T22:10:24.5216396Z _warnings.warn(warn_message, ResourceWarning) 2023-01-11T22:10:24.5217056Z /opt/conda/lib/python3.10/tempfile.py:860: ResourceWarning: Implicitly cleaning up 2023-01-11T22:10:24.5217221Z _warnings.warn(warn_message, ResourceWarning) 2023-01-11T22:10:24.5217318Z ok (6.560s) 2023-01-11T22:10:24.5217339Z 2023-01-11T22:10:24.5217608Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5217713Z Ran 1 test in 6.560s 2023-01-11T22:10:24.5217733Z 2023-01-11T22:10:24.5217807Z OK 2023-01-11T22:10:24.5217826Z 2023-01-11T22:10:24.5217948Z Generating XML reports... 2023-01-11T22:10:24.5218387Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215800.xml 2023-01-11T22:10:24.5218756Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5218930Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5219306Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5219496Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5219516Z 2023-01-11T22:10:24.5219624Z Running tests... 2023-01-11T22:10:24.5219870Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5220177Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5220434Z test_ddp_logging_data_cpu (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5220652Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40616 2023-01-11T22:10:24.5220872Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40617 2023-01-11T22:10:24.5221242Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5221416Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5221790Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5221977Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5222323Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5222495Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5222863Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5223051Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5223387Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5223635Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5224033Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5224423Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5224650Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5224852Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5225081Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5225308Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5225638Z [1673474293.643186] [7c5487d9c02b:40616:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5225877Z [1673474295.077314] [7c5487d9c02b:40616:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5226111Z [1673474295.077314] [7c5487d9c02b:40616:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5226385Z [1673474293.645704] [7c5487d9c02b:40617:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5226613Z [1673474295.101832] [7c5487d9c02b:40617:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5226844Z [1673474295.101832] [7c5487d9c02b:40617:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5226932Z ok (6.215s) 2023-01-11T22:10:24.5226968Z 2023-01-11T22:10:24.5227222Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5227336Z Ran 1 test in 6.215s 2023-01-11T22:10:24.5227356Z 2023-01-11T22:10:24.5227442Z OK 2023-01-11T22:10:24.5227462Z 2023-01-11T22:10:24.5227579Z Generating XML reports... 2023-01-11T22:10:24.5228017Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215809.xml 2023-01-11T22:10:24.5228384Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5228560Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5228932Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5229105Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5229128Z 2023-01-11T22:10:24.5229230Z Running tests... 2023-01-11T22:10:24.5229491Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5229797Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5230060Z test_ddp_logging_data_gpu (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5230280Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40760 2023-01-11T22:10:24.5230496Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40761 2023-01-11T22:10:24.5230861Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5231019Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5231386Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5231631Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5231998Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5232168Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5232542Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5232730Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5232967Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5233209Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5233587Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5233981Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5234266Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5234499Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5234771Z [1673474303.769464] [7c5487d9c02b:40761:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5235001Z [1673474303.782623] [7c5487d9c02b:40761:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5235236Z [1673474303.782623] [7c5487d9c02b:40761:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5235506Z [1673474303.764679] [7c5487d9c02b:40760:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5235739Z [1673474303.778275] [7c5487d9c02b:40760:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5235974Z [1673474303.778275] [7c5487d9c02b:40760:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5236192Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5236425Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5236527Z ok (6.571s) 2023-01-11T22:10:24.5236545Z 2023-01-11T22:10:24.5236815Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5236926Z Ran 1 test in 6.571s 2023-01-11T22:10:24.5236946Z 2023-01-11T22:10:24.5237037Z OK 2023-01-11T22:10:24.5237056Z 2023-01-11T22:10:24.5237181Z Generating XML reports... 2023-01-11T22:10:24.5237628Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215818.xml 2023-01-11T22:10:24.5237980Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5238156Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5238533Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5238719Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5238739Z 2023-01-11T22:10:24.5238847Z Running tests... 2023-01-11T22:10:24.5239107Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5239415Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5239700Z test_ddp_model_diff_num_params_across_ranks (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5239974Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40878 2023-01-11T22:10:24.5240175Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40879 2023-01-11T22:10:24.5240548Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5240721Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5241089Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5241275Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5241633Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5241806Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5242183Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5242402Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5242649Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5242889Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5243288Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5243679Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5243906Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5244128Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5244367Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:10:24.5244600Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:10:24.5244971Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.5245359Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.5245596Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:10:24.5245833Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:10:24.5246212Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:10:24.5246600Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:10:24.5246875Z [1673474312.916084] [7c5487d9c02b:40879:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5247106Z [1673474312.929143] [7c5487d9c02b:40879:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5247337Z [1673474312.929143] [7c5487d9c02b:40879:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5247603Z [1673474312.908546] [7c5487d9c02b:40878:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5247812Z [1673474312.922085] [7c5487d9c02b:40878:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5248042Z [1673474312.922085] [7c5487d9c02b:40878:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5248202Z ok (6.114s) 2023-01-11T22:10:24.5248222Z 2023-01-11T22:10:24.5248491Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5248604Z Ran 1 test in 6.115s 2023-01-11T22:10:24.5248623Z 2023-01-11T22:10:24.5248711Z OK 2023-01-11T22:10:24.5248730Z 2023-01-11T22:10:24.5248854Z Generating XML reports... 2023-01-11T22:10:24.5249293Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215827.xml 2023-01-11T22:10:24.5249657Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5249814Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5250190Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5250381Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5250400Z 2023-01-11T22:10:24.5250504Z Running tests... 2023-01-11T22:10:24.5250811Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5251125Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5251403Z test_ddp_model_diff_shape_across_ranks (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5251619Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40998 2023-01-11T22:10:24.5251816Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40999 2023-01-11T22:10:24.5252177Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5252347Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5252715Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5252884Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5253257Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5253442Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5253810Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5253995Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5254219Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5254459Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5254849Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5255241Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5255465Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5255685Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5255915Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:10:24.5256150Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:10:24.5256879Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.5257287Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.5257612Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:10:24.5257855Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:10:24.5258245Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:10:24.5258626Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:10:24.5258900Z [1673474321.690674] [7c5487d9c02b:40998:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5259131Z [1673474321.704493] [7c5487d9c02b:40998:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5259368Z [1673474321.704493] [7c5487d9c02b:40998:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5259740Z [1673474332.022914] [7c5487d9c02b:40998:1] ucc_schedule.h:189 UCC WARN timeout 10 sec. has expired on req 0x32ab6280, seq_num 3, TL_UCP, team_id 1, size 2, rank 0, ctx_rank 0: Barrier n/a inplace=0 bytes=0 2023-01-11T22:10:24.5260025Z [1673474332.053982] [7c5487d9c02b:40998:0] mpool.c:55 UCX WARN object 0x32bc77c0 {flags:0x20040 recv length 0 host memory} was not returned to mpool ucp_requests 2023-01-11T22:10:24.5260291Z [1673474321.690647] [7c5487d9c02b:40999:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5260502Z [1673474321.704489] [7c5487d9c02b:40999:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5260736Z [1673474321.704489] [7c5487d9c02b:40999:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5261127Z [1673474332.063926] [7c5487d9c02b:40999:0] tag_match.c:62 UCX WARN unexpected tag-receive descriptor 0x37ce0380 was not matched 2023-01-11T22:10:24.5261229Z ok (16.281s) 2023-01-11T22:10:24.5261252Z 2023-01-11T22:10:24.5261515Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5261626Z Ran 1 test in 16.281s 2023-01-11T22:10:24.5261645Z 2023-01-11T22:10:24.5261736Z OK 2023-01-11T22:10:24.5261756Z 2023-01-11T22:10:24.5261878Z Generating XML reports... 2023-01-11T22:10:24.5262315Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215836.xml 2023-01-11T22:10:24.5262667Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5262840Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5263208Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5263396Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5263417Z 2023-01-11T22:10:24.5263523Z Running tests... 2023-01-11T22:10:24.5263789Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5264096Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5264396Z test_ddp_multiple_nested_unused_params_err_ignore_params (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5264597Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41118 2023-01-11T22:10:24.5264818Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41119 2023-01-11T22:10:24.5265182Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5265412Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5265787Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5265973Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5266334Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5266506Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5266877Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5267047Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5267285Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5267525Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5267924Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5268356Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5268589Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5268816Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5269084Z [1673474340.460091] [7c5487d9c02b:41118:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5269314Z [1673474340.473637] [7c5487d9c02b:41118:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5269531Z [1673474340.473637] [7c5487d9c02b:41118:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5269806Z [1673474340.465480] [7c5487d9c02b:41119:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5270031Z [1673474340.478890] [7c5487d9c02b:41119:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5270263Z [1673474340.478890] [7c5487d9c02b:41119:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5270361Z ok (7.033s) 2023-01-11T22:10:24.5270381Z 2023-01-11T22:10:24.5270645Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5270755Z Ran 1 test in 7.033s 2023-01-11T22:10:24.5270774Z 2023-01-11T22:10:24.5270865Z OK 2023-01-11T22:10:24.5270884Z 2023-01-11T22:10:24.5271004Z Generating XML reports... 2023-01-11T22:10:24.5271427Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215855.xml 2023-01-11T22:10:24.5271795Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5271973Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5272344Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5272531Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5272551Z 2023-01-11T22:10:24.5272661Z Running tests... 2023-01-11T22:10:24.5272922Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5273229Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5273515Z test_ddp_multiple_nested_unused_params_error (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5273772Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41236 2023-01-11T22:10:24.5273985Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41237 2023-01-11T22:10:24.5274356Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5274528Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5274902Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5275089Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5275449Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5275615Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5275970Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5276154Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5276442Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5276687Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5277078Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5277465Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5277688Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5277910Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5278178Z [1673474349.967830] [7c5487d9c02b:41237:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5278413Z [1673474349.981086] [7c5487d9c02b:41237:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5278632Z [1673474349.981086] [7c5487d9c02b:41237:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5278893Z [1673474349.959679] [7c5487d9c02b:41236:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5279116Z [1673474349.973231] [7c5487d9c02b:41236:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5279350Z [1673474349.973231] [7c5487d9c02b:41236:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5279451Z ok (6.941s) 2023-01-11T22:10:24.5279474Z 2023-01-11T22:10:24.5279730Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5279843Z Ran 1 test in 6.941s 2023-01-11T22:10:24.5279862Z 2023-01-11T22:10:24.5279952Z OK 2023-01-11T22:10:24.5279973Z 2023-01-11T22:10:24.5280099Z Generating XML reports... 2023-01-11T22:10:24.5280520Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215904.xml 2023-01-11T22:10:24.5280883Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5281052Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5281419Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5281609Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5281627Z 2023-01-11T22:10:24.5281785Z Running tests... 2023-01-11T22:10:24.5282051Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5282358Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5282597Z test_ddp_namedtuple (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5282814Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41354 2023-01-11T22:10:24.5283029Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41355 2023-01-11T22:10:24.5283397Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5283566Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5283938Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5284125Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5284486Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5284701Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5285066Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5285251Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5285486Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5285730Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5286126Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5286516Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5286746Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5286976Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5287246Z [1673474359.424593] [7c5487d9c02b:41355:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5287458Z [1673474359.437919] [7c5487d9c02b:41355:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5287690Z [1673474359.437919] [7c5487d9c02b:41355:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5287954Z [1673474359.420896] [7c5487d9c02b:41354:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5288181Z [1673474359.434685] [7c5487d9c02b:41354:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5288414Z [1673474359.434685] [7c5487d9c02b:41354:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5288513Z ok (6.699s) 2023-01-11T22:10:24.5288535Z 2023-01-11T22:10:24.5288793Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5288901Z Ran 1 test in 6.699s 2023-01-11T22:10:24.5288921Z 2023-01-11T22:10:24.5289010Z OK 2023-01-11T22:10:24.5289030Z 2023-01-11T22:10:24.5289135Z Generating XML reports... 2023-01-11T22:10:24.5289572Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215914.xml 2023-01-11T22:10:24.5289934Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5290105Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5290537Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5290731Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5290750Z 2023-01-11T22:10:24.5290859Z Running tests... 2023-01-11T22:10:24.5291120Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5291427Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5291669Z test_ddp_new_tensor_in_fwd (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5291886Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41468 2023-01-11T22:10:24.5292101Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41469 2023-01-11T22:10:24.5292467Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5292637Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5293068Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5293258Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5293619Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5293775Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5294143Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5294323Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5294560Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5294804Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5295200Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5295587Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5295811Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5296032Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5296285Z [1673474368.700885] [7c5487d9c02b:41469:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5296514Z [1673474368.714320] [7c5487d9c02b:41469:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5296992Z [1673474368.714320] [7c5487d9c02b:41469:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5297769Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:10:24.5298041Z [1673474368.699230] [7c5487d9c02b:41468:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5298265Z [1673474368.712904] [7c5487d9c02b:41468:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5298588Z [1673474368.712904] [7c5487d9c02b:41468:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5299359Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:10:24.5299595Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5299830Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5299933Z ok (6.605s) 2023-01-11T22:10:24.5299953Z 2023-01-11T22:10:24.5300226Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5300336Z Ran 1 test in 6.605s 2023-01-11T22:10:24.5300355Z 2023-01-11T22:10:24.5300521Z OK 2023-01-11T22:10:24.5300557Z 2023-01-11T22:10:24.5300669Z Generating XML reports... 2023-01-11T22:10:24.5301112Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215923.xml 2023-01-11T22:10:24.5301476Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5301647Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5302023Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5302209Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5302233Z 2023-01-11T22:10:24.5302336Z Running tests... 2023-01-11T22:10:24.5302601Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5302899Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5303175Z test_ddp_new_tensor_in_fwd_static_graph (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5303909Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78338 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.661s) 2023-01-11T22:10:24.5303930Z 2023-01-11T22:10:24.5304188Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5304297Z Ran 1 test in 1.662s 2023-01-11T22:10:24.5304316Z 2023-01-11T22:10:24.5304423Z OK (skipped=1) 2023-01-11T22:10:24.5304446Z 2023-01-11T22:10:24.5304564Z Generating XML reports... 2023-01-11T22:10:24.5305006Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215932.xml 2023-01-11T22:10:24.5305376Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5305550Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5305909Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5306098Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5306117Z 2023-01-11T22:10:24.5306227Z Running tests... 2023-01-11T22:10:24.5306483Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5306790Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5307125Z test_ddp_profiling_autograd_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5307916Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77342 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.628s) 2023-01-11T22:10:24.5307938Z 2023-01-11T22:10:24.5308204Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5308314Z Ran 1 test in 1.628s 2023-01-11T22:10:24.5308333Z 2023-01-11T22:10:24.5308424Z OK (skipped=1) 2023-01-11T22:10:24.5308461Z 2023-01-11T22:10:24.5308567Z Generating XML reports... 2023-01-11T22:10:24.5309006Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215936.xml 2023-01-11T22:10:24.5309374Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5309554Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5309983Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5310177Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5310196Z 2023-01-11T22:10:24.5310304Z Running tests... 2023-01-11T22:10:24.5310561Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5310849Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5311118Z test_ddp_profiling_torch_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5311335Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41654 2023-01-11T22:10:24.5311547Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41655 2023-01-11T22:10:24.5311917Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5312094Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5312466Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5312650Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5312993Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5313163Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5313527Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5313709Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5313956Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5314199Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5314591Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5314981Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5315207Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5315413Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5315684Z [1673474386.224546] [7c5487d9c02b:41654:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5315966Z [1673474386.238189] [7c5487d9c02b:41654:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5316200Z [1673474386.238189] [7c5487d9c02b:41654:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5316540Z STAGE:2023-01-11 21:59:46 41654:41654 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5316809Z [1673474386.228756] [7c5487d9c02b:41655:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5317038Z [1673474386.241826] [7c5487d9c02b:41655:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5317273Z [1673474386.241826] [7c5487d9c02b:41655:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5317601Z STAGE:2023-01-11 21:59:46 41655:41655 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5317836Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5318097Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5318438Z STAGE:2023-01-11 21:59:47 41654:41654 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5318765Z STAGE:2023-01-11 21:59:47 41655:41655 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5319104Z STAGE:2023-01-11 21:59:47 41655:41655 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5319444Z STAGE:2023-01-11 21:59:47 41654:41654 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5320214Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:10:24.5320982Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:10:24.5321310Z STAGE:2023-01-11 21:59:47 41654:41654 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5321633Z STAGE:2023-01-11 21:59:47 41655:41655 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5321970Z STAGE:2023-01-11 21:59:47 41654:41654 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5322297Z STAGE:2023-01-11 21:59:47 41655:41655 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5322636Z STAGE:2023-01-11 21:59:47 41654:41654 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5322971Z STAGE:2023-01-11 21:59:47 41655:41655 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5323057Z ok (7.165s) 2023-01-11T22:10:24.5323077Z 2023-01-11T22:10:24.5323336Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5323442Z Ran 1 test in 7.166s 2023-01-11T22:10:24.5323462Z 2023-01-11T22:10:24.5323554Z OK 2023-01-11T22:10:24.5323573Z 2023-01-11T22:10:24.5323695Z Generating XML reports... 2023-01-11T22:10:24.5324200Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215940.xml 2023-01-11T22:10:24.5324570Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5324744Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5325102Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5325290Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5325310Z 2023-01-11T22:10:24.5325415Z Running tests... 2023-01-11T22:10:24.5325670Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5325979Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5326244Z test_ddp_python_error_logged (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5326465Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41776 2023-01-11T22:10:24.5326726Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41777 2023-01-11T22:10:24.5327101Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5327258Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5327629Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5327816Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5328177Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5328349Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5328728Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5328917Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5329163Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5329387Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5329782Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5330171Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5330396Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5330621Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5330892Z [1673474395.951049] [7c5487d9c02b:41777:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5331124Z [1673474395.964493] [7c5487d9c02b:41777:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5331361Z [1673474395.964493] [7c5487d9c02b:41777:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5331627Z [1673474395.942574] [7c5487d9c02b:41776:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5331851Z [1673474395.956436] [7c5487d9c02b:41776:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5332068Z [1673474395.956436] [7c5487d9c02b:41776:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5332230Z ok (6.166s) 2023-01-11T22:10:24.5332250Z 2023-01-11T22:10:24.5332519Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5332630Z Ran 1 test in 6.166s 2023-01-11T22:10:24.5332653Z 2023-01-11T22:10:24.5332747Z OK 2023-01-11T22:10:24.5332766Z 2023-01-11T22:10:24.5332888Z Generating XML reports... 2023-01-11T22:10:24.5333324Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215950.xml 2023-01-11T22:10:24.5333689Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5333859Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5334215Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5334405Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5334428Z 2023-01-11T22:10:24.5334534Z Running tests... 2023-01-11T22:10:24.5334793Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5335148Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5335424Z test_ddp_returns_tensor_with_no_grad (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5336159Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78595 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.587s) 2023-01-11T22:10:24.5336179Z 2023-01-11T22:10:24.5336434Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5336773Z Ran 1 test in 1.587s 2023-01-11T22:10:24.5336795Z 2023-01-11T22:10:24.5336900Z OK (skipped=1) 2023-01-11T22:10:24.5336942Z 2023-01-11T22:10:24.5337050Z Generating XML reports... 2023-01-11T22:10:24.5337503Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215959.xml 2023-01-11T22:10:24.5337873Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5338045Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5338418Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5338603Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5338622Z 2023-01-11T22:10:24.5338730Z Running tests... 2023-01-11T22:10:24.5338985Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5339278Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5339561Z test_ddp_shared_grad_acc_unused_params (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5339781Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41924 2023-01-11T22:10:24.5339995Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41925 2023-01-11T22:10:24.5340355Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5340524Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5340886Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5341069Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5341415Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5341679Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5342060Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5342252Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5342494Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5342736Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5343133Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5343521Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5343750Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5343967Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5344928Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1911: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2023-01-11T22:10:24.5345056Z warnings.warn( 2023-01-11T22:10:24.5345957Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1911: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2023-01-11T22:10:24.5346067Z warnings.warn( 2023-01-11T22:10:24.5346298Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5346530Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5346801Z [1673474408.671265] [7c5487d9c02b:41924:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5347028Z [1673474408.684918] [7c5487d9c02b:41924:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5347262Z [1673474408.684918] [7c5487d9c02b:41924:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5347525Z [1673474408.671286] [7c5487d9c02b:41925:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5347734Z [1673474408.684285] [7c5487d9c02b:41925:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5347969Z [1673474408.684285] [7c5487d9c02b:41925:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5348074Z ok (6.567s) 2023-01-11T22:10:24.5348094Z 2023-01-11T22:10:24.5348356Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5348464Z Ran 1 test in 6.567s 2023-01-11T22:10:24.5348483Z 2023-01-11T22:10:24.5348573Z OK 2023-01-11T22:10:24.5348592Z 2023-01-11T22:10:24.5348718Z Generating XML reports... 2023-01-11T22:10:24.5349157Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220003.xml 2023-01-11T22:10:24.5349523Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5349681Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5350052Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5350295Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5350319Z 2023-01-11T22:10:24.5350426Z Running tests... 2023-01-11T22:10:24.5350689Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5351000Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5351269Z test_ddp_static_graph_nested_types (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5352008Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77625 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.650s) 2023-01-11T22:10:24.5352029Z 2023-01-11T22:10:24.5352291Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5352398Z Ran 1 test in 1.651s 2023-01-11T22:10:24.5352418Z 2023-01-11T22:10:24.5352509Z OK (skipped=1) 2023-01-11T22:10:24.5352591Z 2023-01-11T22:10:24.5352718Z Generating XML reports... 2023-01-11T22:10:24.5353157Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220012.xml 2023-01-11T22:10:24.5353522Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5353696Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5354067Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5354254Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5354273Z 2023-01-11T22:10:24.5354379Z Running tests... 2023-01-11T22:10:24.5354629Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5354931Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5355197Z test_ddp_sync_bn_training_vs_eval (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5355412Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42076 2023-01-11T22:10:24.5355628Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42077 2023-01-11T22:10:24.5355996Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5356173Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5356548Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5356736Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5357084Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5357258Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5357628Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5357814Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5358054Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5358297Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5358690Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5359077Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5359357Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5359570Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5359841Z [1673474421.951357] [7c5487d9c02b:42077:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5360072Z [1673474421.964896] [7c5487d9c02b:42077:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5360301Z [1673474421.964896] [7c5487d9c02b:42077:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5360639Z STAGE:2023-01-11 22:00:22 42077:42077 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5360908Z [1673474421.949089] [7c5487d9c02b:42076:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5361182Z [1673474421.963228] [7c5487d9c02b:42076:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5361422Z [1673474421.963228] [7c5487d9c02b:42076:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5361756Z STAGE:2023-01-11 22:00:22 42076:42076 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5361972Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5362201Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:10:24.5362528Z STAGE:2023-01-11 22:00:22 42077:42077 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5362854Z STAGE:2023-01-11 22:00:22 42076:42076 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5363198Z STAGE:2023-01-11 22:00:22 42076:42076 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5363534Z STAGE:2023-01-11 22:00:22 42077:42077 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5363855Z STAGE:2023-01-11 22:00:22 42076:42076 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5364173Z STAGE:2023-01-11 22:00:23 42076:42076 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5364502Z STAGE:2023-01-11 22:00:23 42076:42076 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5364587Z ok (7.532s) 2023-01-11T22:10:24.5364621Z 2023-01-11T22:10:24.5364867Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5364978Z Ran 1 test in 7.532s 2023-01-11T22:10:24.5364997Z 2023-01-11T22:10:24.5365086Z OK 2023-01-11T22:10:24.5365105Z 2023-01-11T22:10:24.5365225Z Generating XML reports... 2023-01-11T22:10:24.5365667Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220016.xml 2023-01-11T22:10:24.5366033Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5366207Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5366583Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5366755Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5366775Z 2023-01-11T22:10:24.5366886Z Running tests... 2023-01-11T22:10:24.5367145Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5367453Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5367716Z test_ddp_sync_module_states (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5367990Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42198 2023-01-11T22:10:24.5368208Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42199 2023-01-11T22:10:24.5368581Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5368738Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5369112Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5369300Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5369664Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5369836Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5370213Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5370448Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5370699Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5370942Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5371324Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5371717Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5371944Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5372167Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5372443Z [1673474432.286305] [7c5487d9c02b:42199:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5372672Z [1673474432.302573] [7c5487d9c02b:42199:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5372905Z [1673474432.302573] [7c5487d9c02b:42199:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5373171Z [1673474432.277479] [7c5487d9c02b:42198:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5373394Z [1673474432.291369] [7c5487d9c02b:42198:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5373619Z [1673474432.291369] [7c5487d9c02b:42198:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5373708Z ok (6.372s) 2023-01-11T22:10:24.5373729Z 2023-01-11T22:10:24.5373999Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5374112Z Ran 1 test in 6.372s 2023-01-11T22:10:24.5374132Z 2023-01-11T22:10:24.5374223Z OK 2023-01-11T22:10:24.5374242Z 2023-01-11T22:10:24.5374366Z Generating XML reports... 2023-01-11T22:10:24.5374801Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220026.xml 2023-01-11T22:10:24.5375160Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5375333Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5375691Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5375879Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5375953Z 2023-01-11T22:10:24.5376063Z Running tests... 2023-01-11T22:10:24.5376321Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5376861Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5377144Z test_ddp_uneven_input_exception (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5377363Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42312 2023-01-11T22:10:24.5377576Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42313 2023-01-11T22:10:24.5377944Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5378100Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5378474Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5378664Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5379109Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5379292Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5379659Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5379843Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5380083Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5380307Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5380704Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5381103Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5381328Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5381549Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5381815Z [1673474440.985953] [7c5487d9c02b:42313:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5382045Z [1673474440.999316] [7c5487d9c02b:42313:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5382280Z [1673474440.999316] [7c5487d9c02b:42313:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5382544Z [1673474440.984370] [7c5487d9c02b:42312:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5382776Z [1673474440.997700] [7c5487d9c02b:42312:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5382991Z [1673474440.997700] [7c5487d9c02b:42312:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5383093Z ok (6.136s) 2023-01-11T22:10:24.5383114Z 2023-01-11T22:10:24.5383380Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5383490Z Ran 1 test in 6.136s 2023-01-11T22:10:24.5383509Z 2023-01-11T22:10:24.5383597Z OK 2023-01-11T22:10:24.5383616Z 2023-01-11T22:10:24.5383733Z Generating XML reports... 2023-01-11T22:10:24.5384173Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220035.xml 2023-01-11T22:10:24.5384537Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5384789Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5385157Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5385345Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5385364Z 2023-01-11T22:10:24.5385472Z Running tests... 2023-01-11T22:10:24.5385736Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5386042Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5386309Z test_ddp_uneven_input_join_disable (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5387042Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78684 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.669s) 2023-01-11T22:10:24.5387066Z 2023-01-11T22:10:24.5387367Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5387481Z Ran 1 test in 1.669s 2023-01-11T22:10:24.5387500Z 2023-01-11T22:10:24.5387591Z OK (skipped=1) 2023-01-11T22:10:24.5387627Z 2023-01-11T22:10:24.5387733Z Generating XML reports... 2023-01-11T22:10:24.5388168Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220044.xml 2023-01-11T22:10:24.5388534Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5388705Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5389072Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5389264Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5389283Z 2023-01-11T22:10:24.5389392Z Running tests... 2023-01-11T22:10:24.5389651Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5389940Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5390191Z test_ddp_uneven_inputs (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5390928Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/75648 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.617s) 2023-01-11T22:10:24.5390949Z 2023-01-11T22:10:24.5391202Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5391312Z Ran 1 test in 1.617s 2023-01-11T22:10:24.5391331Z 2023-01-11T22:10:24.5391433Z OK (skipped=1) 2023-01-11T22:10:24.5391452Z 2023-01-11T22:10:24.5391570Z Generating XML reports... 2023-01-11T22:10:24.5392007Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220048.xml 2023-01-11T22:10:24.5392372Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5392547Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5392905Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5393096Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5393115Z 2023-01-11T22:10:24.5393226Z Running tests... 2023-01-11T22:10:24.5393484Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5393853Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5394141Z test_ddp_uneven_inputs_stop_iteration_sync_bn (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5394875Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78113 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.644s) 2023-01-11T22:10:24.5394896Z 2023-01-11T22:10:24.5395155Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5395267Z Ran 1 test in 1.645s 2023-01-11T22:10:24.5395286Z 2023-01-11T22:10:24.5395376Z OK (skipped=1) 2023-01-11T22:10:24.5395408Z 2023-01-11T22:10:24.5395515Z Generating XML reports... 2023-01-11T22:10:24.5395952Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220052.xml 2023-01-11T22:10:24.5396320Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5396545Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5396928Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5397113Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5397132Z 2023-01-11T22:10:24.5397240Z Running tests... 2023-01-11T22:10:24.5397494Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5397783Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5398074Z test_ddp_unused_params_rebuild_buckets_exception (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5398295Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42528 2023-01-11T22:10:24.5398508Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42529 2023-01-11T22:10:24.5398879Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5399048Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5399424Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5399613Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5399958Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5400132Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5400503Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5400692Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5400935Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5401175Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5401569Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5401956Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5402183Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5402392Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5402662Z [1673474462.273535] [7c5487d9c02b:42528:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5402949Z [1673474462.287227] [7c5487d9c02b:42528:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5403183Z [1673474462.287227] [7c5487d9c02b:42528:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5403447Z [1673474462.278627] [7c5487d9c02b:42529:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5403672Z [1673474462.291024] [7c5487d9c02b:42529:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5403905Z [1673474462.291024] [7c5487d9c02b:42529:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5404007Z ok (6.666s) 2023-01-11T22:10:24.5404030Z 2023-01-11T22:10:24.5404293Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5404405Z Ran 1 test in 6.666s 2023-01-11T22:10:24.5404425Z 2023-01-11T22:10:24.5404500Z OK 2023-01-11T22:10:24.5404519Z 2023-01-11T22:10:24.5404687Z Generating XML reports... 2023-01-11T22:10:24.5405138Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220056.xml 2023-01-11T22:10:24.5405510Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5405686Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5406056Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5406244Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5406264Z 2023-01-11T22:10:24.5406371Z Running tests... 2023-01-11T22:10:24.5406621Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5406929Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5407199Z test_ddp_zero_output_features (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5407417Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42646 2023-01-11T22:10:24.5407679Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42647 2023-01-11T22:10:24.5408051Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5408224Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5408594Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5408775Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5409124Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5409295Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5409667Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5409851Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5410092Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5410334Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5410728Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5411118Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5411408Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5411773Z /opt/conda/lib/python3.10/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op 2023-01-11T22:10:24.5412029Z warnings.warn("Initializing zero-element tensors is a no-op") 2023-01-11T22:10:24.5412260Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5412633Z /opt/conda/lib/python3.10/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op 2023-01-11T22:10:24.5412887Z warnings.warn("Initializing zero-element tensors is a no-op") 2023-01-11T22:10:24.5413157Z [1673474471.410438] [7c5487d9c02b:42647:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5413386Z [1673474471.423649] [7c5487d9c02b:42647:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5413680Z [1673474471.423649] [7c5487d9c02b:42647:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5413954Z [1673474471.409196] [7c5487d9c02b:42646:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5414163Z [1673474471.423310] [7c5487d9c02b:42646:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5414398Z [1673474471.423310] [7c5487d9c02b:42646:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5414497Z ok (6.157s) 2023-01-11T22:10:24.5414518Z 2023-01-11T22:10:24.5414783Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5414892Z Ran 1 test in 6.157s 2023-01-11T22:10:24.5414916Z 2023-01-11T22:10:24.5415006Z OK 2023-01-11T22:10:24.5415025Z 2023-01-11T22:10:24.5415144Z Generating XML reports... 2023-01-11T22:10:24.5415586Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220106.xml 2023-01-11T22:10:24.5415949Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5416107Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5416477Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5416898Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5416922Z 2023-01-11T22:10:24.5417039Z Running tests... 2023-01-11T22:10:24.5417304Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5417615Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5417879Z test_destroy_full_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5418095Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42760 2023-01-11T22:10:24.5418294Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42761 2023-01-11T22:10:24.5418663Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5418837Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5419207Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5419394Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5419758Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5420021Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5420401Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5420587Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5420812Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5421048Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5421441Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5421831Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5422057Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5422297Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:10:24.5422582Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5422821Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:10:24.5423208Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.5423574Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.5423679Z ok (4.251s) 2023-01-11T22:10:24.5423699Z 2023-01-11T22:10:24.5423960Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5424069Z Ran 1 test in 4.251s 2023-01-11T22:10:24.5424088Z 2023-01-11T22:10:24.5424181Z OK 2023-01-11T22:10:24.5424200Z 2023-01-11T22:10:24.5424330Z Generating XML reports... 2023-01-11T22:10:24.5424772Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220114.xml 2023-01-11T22:10:24.5425135Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5425293Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5425664Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5425846Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5425865Z 2023-01-11T22:10:24.5425969Z Running tests... 2023-01-11T22:10:24.5426226Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5426531Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5426784Z test_destroy_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5426999Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42863 2023-01-11T22:10:24.5427213Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42864 2023-01-11T22:10:24.5427564Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5427731Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5428102Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5428282Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5428642Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5428815Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5429244Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5429434Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5429657Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5429896Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5430287Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5430673Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5430900Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5431135Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:10:24.5431360Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5431637Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:10:24.5432041Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.5432423Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.5432509Z ok (4.355s) 2023-01-11T22:10:24.5432529Z 2023-01-11T22:10:24.5432792Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5432901Z Ran 1 test in 4.356s 2023-01-11T22:10:24.5432921Z 2023-01-11T22:10:24.5433006Z OK 2023-01-11T22:10:24.5433026Z 2023-01-11T22:10:24.5433147Z Generating XML reports... 2023-01-11T22:10:24.5433581Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220121.xml 2023-01-11T22:10:24.5433949Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5434124Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5434480Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5434671Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5434690Z 2023-01-11T22:10:24.5434797Z Running tests... 2023-01-11T22:10:24.5435053Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5435355Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5435628Z test_detect_ddp_is_actually_static (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5436374Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78767 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.644s) 2023-01-11T22:10:24.5436396Z 2023-01-11T22:10:24.5436656Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5436766Z Ran 1 test in 1.644s 2023-01-11T22:10:24.5436786Z 2023-01-11T22:10:24.5436893Z OK (skipped=1) 2023-01-11T22:10:24.5436912Z 2023-01-11T22:10:24.5437018Z Generating XML reports... 2023-01-11T22:10:24.5437453Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220128.xml 2023-01-11T22:10:24.5437814Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5437984Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5438446Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5438637Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5438656Z 2023-01-11T22:10:24.5438765Z Running tests... 2023-01-11T22:10:24.5439021Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5439311Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5439582Z test_different_graph_across_ranks (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5440313Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78748 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.618s) 2023-01-11T22:10:24.5440337Z 2023-01-11T22:10:24.5440597Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5440708Z Ran 1 test in 1.619s 2023-01-11T22:10:24.5440773Z 2023-01-11T22:10:24.5440886Z OK (skipped=1) 2023-01-11T22:10:24.5440905Z 2023-01-11T22:10:24.5441029Z Generating XML reports... 2023-01-11T22:10:24.5441468Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220132.xml 2023-01-11T22:10:24.5441836Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5442008Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5442366Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5442549Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5442572Z 2023-01-11T22:10:24.5442678Z Running tests... 2023-01-11T22:10:24.5442936Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5443242Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5443505Z test_dump_DDP_relevant_env_vars (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5443718Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43034 2023-01-11T22:10:24.5443930Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43035 2023-01-11T22:10:24.5444279Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5444451Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5444817Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5445004Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5445361Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5445528Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5445897Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5446087Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5446328Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5446552Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5446945Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5447398Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5447632Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5447855Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5447959Z ok (4.255s) 2023-01-11T22:10:24.5447979Z 2023-01-11T22:10:24.5448235Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5448344Z Ran 1 test in 4.255s 2023-01-11T22:10:24.5448363Z 2023-01-11T22:10:24.5448453Z OK 2023-01-11T22:10:24.5448472Z 2023-01-11T22:10:24.5448577Z Generating XML reports... 2023-01-11T22:10:24.5449012Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220136.xml 2023-01-11T22:10:24.5449371Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5449545Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5449960Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5450152Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5450172Z 2023-01-11T22:10:24.5450283Z Running tests... 2023-01-11T22:10:24.5450542Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5450834Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5451083Z test_gather (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T22:10:24.5451103Z 2023-01-11T22:10:24.5451355Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5451462Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5451482Z 2023-01-11T22:10:24.5451590Z OK (skipped=1) 2023-01-11T22:10:24.5451609Z 2023-01-11T22:10:24.5451732Z Generating XML reports... 2023-01-11T22:10:24.5452173Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220143.xml 2023-01-11T22:10:24.5452534Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5452705Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5453061Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5453249Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5453269Z 2023-01-11T22:10:24.5453375Z Running tests... 2023-01-11T22:10:24.5453632Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5453938Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5454199Z test_gather_checks (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T22:10:24.5454220Z 2023-01-11T22:10:24.5454475Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5454586Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5454606Z 2023-01-11T22:10:24.5454716Z OK (skipped=1) 2023-01-11T22:10:24.5454735Z 2023-01-11T22:10:24.5454841Z Generating XML reports... 2023-01-11T22:10:24.5455278Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220146.xml 2023-01-11T22:10:24.5455643Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5455813Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5456183Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5456429Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5456450Z 2023-01-11T22:10:24.5456792Z Running tests... 2023-01-11T22:10:24.5457073Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5457370Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5457625Z test_gather_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA gather (0.002s) 2023-01-11T22:10:24.5457645Z 2023-01-11T22:10:24.5457900Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5458005Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5458024Z 2023-01-11T22:10:24.5458125Z OK (skipped=1) 2023-01-11T22:10:24.5458145Z 2023-01-11T22:10:24.5458262Z Generating XML reports... 2023-01-11T22:10:24.5458701Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220148.xml 2023-01-11T22:10:24.5459070Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5459322Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5459694Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5459880Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5459900Z 2023-01-11T22:10:24.5460003Z Running tests... 2023-01-11T22:10:24.5460260Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5460559Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5460822Z test_gather_full_group (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T22:10:24.5460842Z 2023-01-11T22:10:24.5461097Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5461206Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5461225Z 2023-01-11T22:10:24.5461333Z OK (skipped=1) 2023-01-11T22:10:24.5461352Z 2023-01-11T22:10:24.5461461Z Generating XML reports... 2023-01-11T22:10:24.5461900Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220150.xml 2023-01-11T22:10:24.5462264Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5462436Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5462860Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5463051Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5463071Z 2023-01-11T22:10:24.5463179Z Running tests... 2023-01-11T22:10:24.5463444Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5463731Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5463993Z test_gather_group (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T22:10:24.5464013Z 2023-01-11T22:10:24.5464266Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5464377Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5464397Z 2023-01-11T22:10:24.5464505Z OK (skipped=1) 2023-01-11T22:10:24.5464524Z 2023-01-11T22:10:24.5464645Z Generating XML reports... 2023-01-11T22:10:24.5465085Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220153.xml 2023-01-11T22:10:24.5465448Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5465623Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5466079Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5466272Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5466291Z 2023-01-11T22:10:24.5466404Z Running tests... 2023-01-11T22:10:24.5466663Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5466969Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5467229Z test_gather_object (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T22:10:24.5467248Z 2023-01-11T22:10:24.5467504Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5467615Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5467634Z 2023-01-11T22:10:24.5467740Z OK (skipped=1) 2023-01-11T22:10:24.5467759Z 2023-01-11T22:10:24.5467869Z Generating XML reports... 2023-01-11T22:10:24.5468303Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220155.xml 2023-01-11T22:10:24.5468711Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5468892Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5469270Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5469452Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5469472Z 2023-01-11T22:10:24.5469576Z Running tests... 2023-01-11T22:10:24.5469833Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5470140Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5470395Z test_gather_object_subgroup (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T22:10:24.5470419Z 2023-01-11T22:10:24.5470678Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5470789Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5470809Z 2023-01-11T22:10:24.5470911Z OK (skipped=1) 2023-01-11T22:10:24.5470930Z 2023-01-11T22:10:24.5471050Z Generating XML reports... 2023-01-11T22:10:24.5471480Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220158.xml 2023-01-11T22:10:24.5471842Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5472011Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5472365Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5472555Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5472574Z 2023-01-11T22:10:24.5472684Z Running tests... 2023-01-11T22:10:24.5472946Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5473253Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5473495Z test_get_backend (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5473708Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43368 2023-01-11T22:10:24.5473923Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43369 2023-01-11T22:10:24.5474287Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5474444Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5474816Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5475062Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5475428Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5475599Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5475966Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5476154Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5476394Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5476619Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5477012Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5477406Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5477684Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5477922Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:10:24.5478142Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5478372Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:10:24.5478762Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.5479146Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.5479236Z ok (4.376s) 2023-01-11T22:10:24.5479273Z 2023-01-11T22:10:24.5479516Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5479627Z Ran 1 test in 4.377s 2023-01-11T22:10:24.5479646Z 2023-01-11T22:10:24.5479742Z OK 2023-01-11T22:10:24.5479762Z 2023-01-11T22:10:24.5479883Z Generating XML reports... 2023-01-11T22:10:24.5480318Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220200.xml 2023-01-11T22:10:24.5480679Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5480850Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5481221Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5481392Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5481429Z 2023-01-11T22:10:24.5481520Z Running tests... 2023-01-11T22:10:24.5481776Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5482082Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5482354Z test_get_future (__main__.TestDistBackendWithSpawn) ... skip: get_future is only supported on mpi, nccl and gloo (0.002s) 2023-01-11T22:10:24.5482373Z 2023-01-11T22:10:24.5482629Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5482738Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5482758Z 2023-01-11T22:10:24.5482862Z OK (skipped=1) 2023-01-11T22:10:24.5482882Z 2023-01-11T22:10:24.5483003Z Generating XML reports... 2023-01-11T22:10:24.5483421Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220207.xml 2023-01-11T22:10:24.5483781Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5484009Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5484391Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5484578Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5484597Z 2023-01-11T22:10:24.5484706Z Running tests... 2023-01-11T22:10:24.5484961Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5485269Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5485491Z test_get_rank (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5485713Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43504 2023-01-11T22:10:24.5485925Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43505 2023-01-11T22:10:24.5486290Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5486460Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5486873Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5487066Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5487431Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5487604Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5487957Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5488146Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5488386Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5488631Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5489031Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5489416Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5489644Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5489869Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5489953Z ok (4.469s) 2023-01-11T22:10:24.5489989Z 2023-01-11T22:10:24.5490233Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5490343Z Ran 1 test in 4.469s 2023-01-11T22:10:24.5490362Z 2023-01-11T22:10:24.5490452Z OK 2023-01-11T22:10:24.5490471Z 2023-01-11T22:10:24.5490594Z Generating XML reports... 2023-01-11T22:10:24.5491032Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220209.xml 2023-01-11T22:10:24.5491393Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5491564Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5491934Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5492104Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5492137Z 2023-01-11T22:10:24.5492228Z Running tests... 2023-01-11T22:10:24.5492484Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5492791Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5493109Z test_get_rank_size_full_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5493328Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43607 2023-01-11T22:10:24.5493543Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43608 2023-01-11T22:10:24.5493905Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5494061Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5494434Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5494618Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5494973Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5495148Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5495522Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5495756Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5496004Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5496243Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5496855Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5497275Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5497500Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5497744Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:10:24.5497966Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5498201Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:10:24.5498594Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.5498983Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.5499084Z ok (4.219s) 2023-01-11T22:10:24.5499104Z 2023-01-11T22:10:24.5499349Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5499459Z Ran 1 test in 4.219s 2023-01-11T22:10:24.5499479Z 2023-01-11T22:10:24.5499571Z OK 2023-01-11T22:10:24.5499591Z 2023-01-11T22:10:24.5499713Z Generating XML reports... 2023-01-11T22:10:24.5500153Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220216.xml 2023-01-11T22:10:24.5500524Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5500700Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5501074Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5501264Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5501284Z 2023-01-11T22:10:24.5501374Z Running tests... 2023-01-11T22:10:24.5501637Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5501942Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5502201Z test_get_rank_size_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5502508Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43710 2023-01-11T22:10:24.5502729Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43711 2023-01-11T22:10:24.5503098Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5503268Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5503625Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5503813Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5504179Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5504353Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5504729Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5504915Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5505215Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5505464Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5505855Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5506229Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5506456Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5506686Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:10:24.5506907Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5507142Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:10:24.5507529Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.5507962Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.5508062Z ok (4.347s) 2023-01-11T22:10:24.5508083Z 2023-01-11T22:10:24.5508346Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5508441Z Ran 1 test in 4.347s 2023-01-11T22:10:24.5508460Z 2023-01-11T22:10:24.5508551Z OK 2023-01-11T22:10:24.5508571Z 2023-01-11T22:10:24.5508691Z Generating XML reports... 2023-01-11T22:10:24.5509128Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220223.xml 2023-01-11T22:10:24.5509500Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5509673Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5510046Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5510235Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5510255Z 2023-01-11T22:10:24.5510345Z Running tests... 2023-01-11T22:10:24.5510603Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5510908Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5511165Z test_invalid_static_graph (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5511382Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43813 2023-01-11T22:10:24.5511659Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43814 2023-01-11T22:10:24.5512033Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5512207Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5512583Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5512755Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5513116Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5513291Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5513661Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5513846Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5514134Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5514383Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5514776Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5515148Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5515375Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5515606Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5515872Z [1673474555.658840] [7c5487d9c02b:43813:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5516109Z [1673474555.672509] [7c5487d9c02b:43813:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5516339Z [1673474555.672509] [7c5487d9c02b:43813:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5516605Z [1673474555.663306] [7c5487d9c02b:43814:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5516832Z [1673474555.676760] [7c5487d9c02b:43814:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5517065Z [1673474555.676760] [7c5487d9c02b:43814:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5517164Z ok (6.516s) 2023-01-11T22:10:24.5517185Z 2023-01-11T22:10:24.5517435Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5517542Z Ran 1 test in 6.516s 2023-01-11T22:10:24.5517562Z 2023-01-11T22:10:24.5517649Z OK 2023-01-11T22:10:24.5517668Z 2023-01-11T22:10:24.5517791Z Generating XML reports... 2023-01-11T22:10:24.5518230Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220230.xml 2023-01-11T22:10:24.5518596Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5518770Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5519141Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5519330Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5519350Z 2023-01-11T22:10:24.5519441Z Running tests... 2023-01-11T22:10:24.5519765Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5520072Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5520311Z test_irecv (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5520524Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43931 2023-01-11T22:10:24.5520737Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43932 2023-01-11T22:10:24.5521099Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5521272Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5521630Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5521819Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5522186Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5522360Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5522784Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5522976Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5523215Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5523454Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5523849Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5524222Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5524455Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5524683Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5524955Z [1673474563.456904] [7c5487d9c02b:43931:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5525180Z [1673474564.902892] [7c5487d9c02b:43931:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5525413Z [1673474564.902892] [7c5487d9c02b:43931:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5525676Z [1673474563.477583] [7c5487d9c02b:43932:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5525902Z [1673474564.891113] [7c5487d9c02b:43932:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5526137Z [1673474564.891113] [7c5487d9c02b:43932:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5526223Z ok (6.275s) 2023-01-11T22:10:24.5526257Z 2023-01-11T22:10:24.5526506Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5526617Z Ran 1 test in 6.275s 2023-01-11T22:10:24.5526637Z 2023-01-11T22:10:24.5526725Z OK 2023-01-11T22:10:24.5526744Z 2023-01-11T22:10:24.5526863Z Generating XML reports... 2023-01-11T22:10:24.5527301Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220239.xml 2023-01-11T22:10:24.5527657Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5527830Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5528268Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5528445Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5528484Z 2023-01-11T22:10:24.5528576Z Running tests... 2023-01-11T22:10:24.5528832Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5529135Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5529369Z test_isend (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5529583Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44041 2023-01-11T22:10:24.5529797Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44042 2023-01-11T22:10:24.5530155Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5530315Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5530735Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5530925Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5531285Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5531454Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5531821Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5532003Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5532245Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5532483Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5532863Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5533253Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5533480Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5533706Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5533974Z [1673474572.327443] [7c5487d9c02b:44042:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5534201Z [1673474573.721614] [7c5487d9c02b:44042:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5534434Z [1673474573.721614] [7c5487d9c02b:44042:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5534709Z [1673474572.307474] [7c5487d9c02b:44041:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5534935Z [1673474573.712280] [7c5487d9c02b:44041:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5535168Z [1673474573.712280] [7c5487d9c02b:44041:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5535254Z ok (6.167s) 2023-01-11T22:10:24.5535274Z 2023-01-11T22:10:24.5535538Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5535644Z Ran 1 test in 6.167s 2023-01-11T22:10:24.5535664Z 2023-01-11T22:10:24.5535753Z OK 2023-01-11T22:10:24.5535772Z 2023-01-11T22:10:24.5535891Z Generating XML reports... 2023-01-11T22:10:24.5536328Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220248.xml 2023-01-11T22:10:24.5536997Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5537181Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5537547Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5537728Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5537749Z 2023-01-11T22:10:24.5537853Z Running tests... 2023-01-11T22:10:24.5538111Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5538415Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5538681Z test_isend_autograd_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5538903Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44151 2023-01-11T22:10:24.5539111Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44152 2023-01-11T22:10:24.5539561Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5539728Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5540107Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5540293Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5540655Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5540827Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5541196Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5541389Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5541631Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5541853Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5542248Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5542638Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5542865Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5543192Z STAGE:2023-01-11 22:03:00 44151:44151 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5543415Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5543744Z STAGE:2023-01-11 22:03:00 44152:44152 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5544019Z [1673474581.013400] [7c5487d9c02b:44152:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5544252Z [1673474582.659860] [7c5487d9c02b:44152:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5544484Z [1673474582.659860] [7c5487d9c02b:44152:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5544802Z STAGE:2023-01-11 22:03:03 44152:44152 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5545143Z STAGE:2023-01-11 22:03:03 44152:44152 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5545407Z [1673474580.993125] [7c5487d9c02b:44151:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5545711Z [1673474582.632219] [7c5487d9c02b:44151:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5545945Z [1673474582.632219] [7c5487d9c02b:44151:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5546280Z STAGE:2023-01-11 22:03:03 44151:44151 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5546625Z STAGE:2023-01-11 22:03:03 44151:44151 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5546728Z ok (6.617s) 2023-01-11T22:10:24.5546748Z 2023-01-11T22:10:24.5547007Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5547101Z Ran 1 test in 6.617s 2023-01-11T22:10:24.5547121Z 2023-01-11T22:10:24.5547212Z OK 2023-01-11T22:10:24.5547234Z 2023-01-11T22:10:24.5547358Z Generating XML reports... 2023-01-11T22:10:24.5547797Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220257.xml 2023-01-11T22:10:24.5548211Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5548388Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5548766Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5548953Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5548973Z 2023-01-11T22:10:24.5549075Z Running tests... 2023-01-11T22:10:24.5549317Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5549625Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5549892Z test_isend_torch_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5550112Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44265 2023-01-11T22:10:24.5550329Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44266 2023-01-11T22:10:24.5550691Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5550865Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5551238Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5551407Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5551766Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5551935Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5552308Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5552497Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5552738Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5552977Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5553370Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5553758Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5553968Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5554300Z STAGE:2023-01-11 22:03:10 44266:44266 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5554583Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5554914Z STAGE:2023-01-11 22:03:10 44265:44265 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5555185Z [1673474590.202143] [7c5487d9c02b:44265:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5555410Z [1673474591.877752] [7c5487d9c02b:44265:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5555644Z [1673474591.877752] [7c5487d9c02b:44265:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5555977Z STAGE:2023-01-11 22:03:12 44265:44265 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5556240Z [1673474590.222904] [7c5487d9c02b:44266:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5556516Z [1673474591.825878] [7c5487d9c02b:44266:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5556739Z [1673474591.825878] [7c5487d9c02b:44266:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5557073Z STAGE:2023-01-11 22:03:12 44266:44266 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5557414Z STAGE:2023-01-11 22:03:12 44265:44265 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5557753Z STAGE:2023-01-11 22:03:12 44266:44266 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5557853Z ok (6.644s) 2023-01-11T22:10:24.5557874Z 2023-01-11T22:10:24.5558135Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5558251Z Ran 1 test in 6.644s 2023-01-11T22:10:24.5558271Z 2023-01-11T22:10:24.5558360Z OK 2023-01-11T22:10:24.5558379Z 2023-01-11T22:10:24.5558486Z Generating XML reports... 2023-01-11T22:10:24.5558935Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220306.xml 2023-01-11T22:10:24.5559296Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5559465Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5559834Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5560017Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5560037Z 2023-01-11T22:10:24.5560136Z Running tests... 2023-01-11T22:10:24.5560391Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5560702Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5560966Z test_monitored_barrier_allreduce_hang (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5561181Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44379 2023-01-11T22:10:24.5561396Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44380 2023-01-11T22:10:24.5561762Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5561933Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5562293Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5562466Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5562838Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5563066Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5563443Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5563625Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5563864Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5564106Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5564499Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5564889Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5565122Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5565358Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:10:24.5565605Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5565846Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:10:24.5566238Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.5566622Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.5566859Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:10:24.5567097Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:10:24.5567480Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:10:24.5567865Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:10:24.5568093Z [E ProcessGroupGloo.cpp:138] [Rank 0]: Rank 1 failed to pass monitoredBarrier in 100 ms 2023-01-11T22:10:24.5568179Z ok (22.881s) 2023-01-11T22:10:24.5568217Z 2023-01-11T22:10:24.5568464Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5568576Z Ran 1 test in 22.882s 2023-01-11T22:10:24.5568596Z 2023-01-11T22:10:24.5568681Z OK 2023-01-11T22:10:24.5568701Z 2023-01-11T22:10:24.5568816Z Generating XML reports... 2023-01-11T22:10:24.5569252Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220315.xml 2023-01-11T22:10:24.5569613Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5569786Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5570158Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5570330Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5570367Z 2023-01-11T22:10:24.5570459Z Running tests... 2023-01-11T22:10:24.5570719Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5571019Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5571310Z test_monitored_barrier_allreduce_hang_wait_all_ranks (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5571527Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44500 2023-01-11T22:10:24.5571796Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44501 2023-01-11T22:10:24.5572158Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5572329Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5572683Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5572863Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5573221Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5573398Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5573771Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5573957Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5574202Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5574520Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5574908Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5575299Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5575526Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5575762Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:10:24.5575980Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5576208Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:10:24.5576815Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.5577228Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.5577470Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:10:24.5577689Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:10:24.5578078Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:10:24.5578464Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:10:24.5578691Z [E ProcessGroupGloo.cpp:2803] [Rank 0]: Rank 1 failed to pass monitoredBarrier in 100 ms 2023-01-11T22:10:24.5578919Z [E ProcessGroupGloo.cpp:138] [Rank 0]: Ranks 1 failed to pass monitoredBarrier in 100 ms 2023-01-11T22:10:24.5579022Z ok (23.104s) 2023-01-11T22:10:24.5579042Z 2023-01-11T22:10:24.5579304Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5579419Z Ran 1 test in 23.104s 2023-01-11T22:10:24.5579438Z 2023-01-11T22:10:24.5579526Z OK 2023-01-11T22:10:24.5579545Z 2023-01-11T22:10:24.5579653Z Generating XML reports... 2023-01-11T22:10:24.5580089Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220340.xml 2023-01-11T22:10:24.5580459Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5580631Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5581007Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5581282Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5581302Z 2023-01-11T22:10:24.5581409Z Running tests... 2023-01-11T22:10:24.5581674Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5581965Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5582367Z test_monitored_barrier_failure_order (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:10:24.5582387Z 2023-01-11T22:10:24.5582639Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5582741Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5582761Z 2023-01-11T22:10:24.5582861Z OK (skipped=1) 2023-01-11T22:10:24.5582880Z 2023-01-11T22:10:24.5582999Z Generating XML reports... 2023-01-11T22:10:24.5583434Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220406.xml 2023-01-11T22:10:24.5583799Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5584026Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5584411Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5584583Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5584603Z 2023-01-11T22:10:24.5584706Z Running tests... 2023-01-11T22:10:24.5584963Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5585268Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5585655Z test_monitored_barrier_gloo (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:10:24.5585679Z 2023-01-11T22:10:24.5585932Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5586042Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5586061Z 2023-01-11T22:10:24.5586169Z OK (skipped=1) 2023-01-11T22:10:24.5586188Z 2023-01-11T22:10:24.5586312Z Generating XML reports... 2023-01-11T22:10:24.5586730Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220408.xml 2023-01-11T22:10:24.5587095Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5587270Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5587637Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5587822Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5587841Z 2023-01-11T22:10:24.5587952Z Running tests... 2023-01-11T22:10:24.5588208Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5588516Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5588911Z test_monitored_barrier_gloo_rank_0_timeout (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:10:24.5588948Z 2023-01-11T22:10:24.5589189Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5589297Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5589317Z 2023-01-11T22:10:24.5589422Z OK (skipped=1) 2023-01-11T22:10:24.5589441Z 2023-01-11T22:10:24.5589559Z Generating XML reports... 2023-01-11T22:10:24.5589988Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220411.xml 2023-01-11T22:10:24.5590347Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5590579Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5590953Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5591126Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5591159Z 2023-01-11T22:10:24.5591249Z Running tests... 2023-01-11T22:10:24.5591505Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5591802Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5592200Z test_monitored_barrier_gloo_subgroup (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:10:24.5592220Z 2023-01-11T22:10:24.5592469Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5592572Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5592595Z 2023-01-11T22:10:24.5592701Z OK (skipped=1) 2023-01-11T22:10:24.5592720Z 2023-01-11T22:10:24.5592842Z Generating XML reports... 2023-01-11T22:10:24.5593311Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220413.xml 2023-01-11T22:10:24.5593683Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5593850Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5594221Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5594407Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5594427Z 2023-01-11T22:10:24.5594530Z Running tests... 2023-01-11T22:10:24.5603876Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5604275Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5604707Z test_monitored_barrier_wait_all_ranks (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:10:24.5604730Z 2023-01-11T22:10:24.5604978Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5605087Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5605106Z 2023-01-11T22:10:24.5605212Z OK (skipped=1) 2023-01-11T22:10:24.5605231Z 2023-01-11T22:10:24.5605350Z Generating XML reports... 2023-01-11T22:10:24.5605794Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220416.xml 2023-01-11T22:10:24.5606161Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5606330Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5606701Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5606881Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5606909Z 2023-01-11T22:10:24.5607003Z Running tests... 2023-01-11T22:10:24.5607260Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5607567Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5608022Z test_nccl_backend_bool_allgather (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'nccl'} (0.002s) 2023-01-11T22:10:24.5608043Z 2023-01-11T22:10:24.5608300Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5608405Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5608425Z 2023-01-11T22:10:24.5608528Z OK (skipped=1) 2023-01-11T22:10:24.5608547Z 2023-01-11T22:10:24.5608661Z Generating XML reports... 2023-01-11T22:10:24.5609213Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220418.xml 2023-01-11T22:10:24.5609585Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5609749Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5610122Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5610307Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5610327Z 2023-01-11T22:10:24.5610427Z Running tests... 2023-01-11T22:10:24.5610682Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5610979Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5611368Z test_nccl_backend_bool_allreduce (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'nccl'} (0.002s) 2023-01-11T22:10:24.5611392Z 2023-01-11T22:10:24.5611633Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5611737Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5611829Z 2023-01-11T22:10:24.5611935Z OK (skipped=1) 2023-01-11T22:10:24.5611955Z 2023-01-11T22:10:24.5612069Z Generating XML reports... 2023-01-11T22:10:24.5612502Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220420.xml 2023-01-11T22:10:24.5612863Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5613029Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5613398Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5613580Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5613603Z 2023-01-11T22:10:24.5613695Z Running tests... 2023-01-11T22:10:24.5613945Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5614251Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5614644Z test_nccl_backend_bool_broadcast (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'nccl'} (0.002s) 2023-01-11T22:10:24.5614663Z 2023-01-11T22:10:24.5614913Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5615017Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5615036Z 2023-01-11T22:10:24.5615133Z OK (skipped=1) 2023-01-11T22:10:24.5615152Z 2023-01-11T22:10:24.5615267Z Generating XML reports... 2023-01-11T22:10:24.5615704Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220423.xml 2023-01-11T22:10:24.5616055Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5616229Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5617027Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5617233Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5617254Z 2023-01-11T22:10:24.5617357Z Running tests... 2023-01-11T22:10:24.5617621Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5617924Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5618318Z test_nccl_backend_bool_reduce (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'nccl'} (0.003s) 2023-01-11T22:10:24.5618339Z 2023-01-11T22:10:24.5618593Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5618808Z Ran 1 test in 0.003s 2023-01-11T22:10:24.5618827Z 2023-01-11T22:10:24.5618929Z OK (skipped=1) 2023-01-11T22:10:24.5618949Z 2023-01-11T22:10:24.5619064Z Generating XML reports... 2023-01-11T22:10:24.5619510Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220425.xml 2023-01-11T22:10:24.5619876Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5620049Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5620422Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5620611Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5620631Z 2023-01-11T22:10:24.5620723Z Running tests... 2023-01-11T22:10:24.5620977Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5621288Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5621646Z test_nccl_high_priority_stream (__main__.TestDistBackendWithSpawn) ... skip: Only NCCL backend supports high priority stream (0.002s) 2023-01-11T22:10:24.5621668Z 2023-01-11T22:10:24.5621933Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5622041Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5622061Z 2023-01-11T22:10:24.5622161Z OK (skipped=1) 2023-01-11T22:10:24.5622180Z 2023-01-11T22:10:24.5622295Z Generating XML reports... 2023-01-11T22:10:24.5622727Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220428.xml 2023-01-11T22:10:24.5623075Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5623246Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5623623Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5623805Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5623827Z 2023-01-11T22:10:24.5623933Z Running tests... 2023-01-11T22:10:24.5624185Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5624485Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5624728Z test_new_subgroups (__main__.TestDistBackendWithSpawn) ... skip: Test requires world size of 4 (0.002s) 2023-01-11T22:10:24.5624748Z 2023-01-11T22:10:24.5624999Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5625093Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5625112Z 2023-01-11T22:10:24.5625215Z OK (skipped=1) 2023-01-11T22:10:24.5625233Z 2023-01-11T22:10:24.5625348Z Generating XML reports... 2023-01-11T22:10:24.5625789Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220430.xml 2023-01-11T22:10:24.5626156Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5626329Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5626694Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5626878Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5626897Z 2023-01-11T22:10:24.5626988Z Running tests... 2023-01-11T22:10:24.5627244Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5627545Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5627807Z test_new_subgroups_by_enumeration (__main__.TestDistBackendWithSpawn) ... skip: Test requires world size of 4 (0.002s) 2023-01-11T22:10:24.5627884Z 2023-01-11T22:10:24.5628137Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5628243Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5628266Z 2023-01-11T22:10:24.5628368Z OK (skipped=1) 2023-01-11T22:10:24.5628387Z 2023-01-11T22:10:24.5628503Z Generating XML reports... 2023-01-11T22:10:24.5628937Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220432.xml 2023-01-11T22:10:24.5629286Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5629457Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5629823Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5630004Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5630028Z 2023-01-11T22:10:24.5630131Z Running tests... 2023-01-11T22:10:24.5630380Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5630728Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5631035Z test_new_subgroups_by_enumeration_input_rank_exceeds_world_size (__main__.TestDistBackendWithSpawn) ... skip: Test requires world size of 4 (0.002s) 2023-01-11T22:10:24.5631055Z 2023-01-11T22:10:24.5631305Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5631399Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5631418Z 2023-01-11T22:10:24.5631518Z OK (skipped=1) 2023-01-11T22:10:24.5631538Z 2023-01-11T22:10:24.5631655Z Generating XML reports... 2023-01-11T22:10:24.5632091Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220435.xml 2023-01-11T22:10:24.5632452Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5632629Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5633003Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5633189Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5633209Z 2023-01-11T22:10:24.5633311Z Running tests... 2023-01-11T22:10:24.5633552Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5633854Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5634148Z test_new_subgroups_by_enumeration_negative_input_rank (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5634363Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45050 2023-01-11T22:10:24.5634578Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45051 2023-01-11T22:10:24.5634942Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5635107Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5635469Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5635638Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5635996Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5636161Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5636530Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5636716Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5637021Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5637260Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5637657Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5638048Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5638257Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5638479Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5638578Z ok (4.272s) 2023-01-11T22:10:24.5638599Z 2023-01-11T22:10:24.5638855Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5638968Z Ran 1 test in 4.272s 2023-01-11T22:10:24.5638988Z 2023-01-11T22:10:24.5639075Z OK 2023-01-11T22:10:24.5639094Z 2023-01-11T22:10:24.5639214Z Generating XML reports... 2023-01-11T22:10:24.5639699Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220437.xml 2023-01-11T22:10:24.5640057Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5640228Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5640596Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5640778Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5640797Z 2023-01-11T22:10:24.5640902Z Running tests... 2023-01-11T22:10:24.5641153Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5641464Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5641751Z test_new_subgroups_group_size_exceeds_world_size (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5641960Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45153 2023-01-11T22:10:24.5642159Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45154 2023-01-11T22:10:24.5642515Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5642680Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5643052Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5643237Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5643590Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5643760Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5644132Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5644300Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5644540Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5644780Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5645173Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5645561Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5645841Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5646060Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5646161Z ok (4.254s) 2023-01-11T22:10:24.5646181Z 2023-01-11T22:10:24.5646443Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5646536Z Ran 1 test in 4.254s 2023-01-11T22:10:24.5646567Z 2023-01-11T22:10:24.5646642Z OK 2023-01-11T22:10:24.5646661Z 2023-01-11T22:10:24.5646781Z Generating XML reports... 2023-01-11T22:10:24.5647219Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220444.xml 2023-01-11T22:10:24.5647578Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5647750Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5648115Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5648299Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5648319Z 2023-01-11T22:10:24.5648470Z Running tests... 2023-01-11T22:10:24.5648719Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5649019Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5649285Z test_new_subgroups_overlap_not_allowed (__main__.TestDistBackendWithSpawn) ... skip: Test requires world size of 4 (0.002s) 2023-01-11T22:10:24.5649305Z 2023-01-11T22:10:24.5649553Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5649657Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5649676Z 2023-01-11T22:10:24.5649778Z OK (skipped=1) 2023-01-11T22:10:24.5649797Z 2023-01-11T22:10:24.5649910Z Generating XML reports... 2023-01-11T22:10:24.5650341Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220451.xml 2023-01-11T22:10:24.5650701Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5650859Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5651228Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5651410Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5651429Z 2023-01-11T22:10:24.5651533Z Running tests... 2023-01-11T22:10:24.5651783Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5652084Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5652372Z test_new_subgroups_world_size_not_divisible_by_group_size (__main__.TestDistBackendWithSpawn) ... skip: Test requires world size of 4 (0.002s) 2023-01-11T22:10:24.5652396Z 2023-01-11T22:10:24.5652645Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5652739Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5652770Z 2023-01-11T22:10:24.5652861Z OK (skipped=1) 2023-01-11T22:10:24.5652880Z 2023-01-11T22:10:24.5652998Z Generating XML reports... 2023-01-11T22:10:24.5653430Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220453.xml 2023-01-11T22:10:24.5653787Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5653958Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5654323Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5654505Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5654579Z 2023-01-11T22:10:24.5654689Z Running tests... 2023-01-11T22:10:24.5654931Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5655240Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5655515Z test_output_unused_in_loss_dict_module (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5656259Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78112 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.625s) 2023-01-11T22:10:24.5656279Z 2023-01-11T22:10:24.5656790Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5656913Z Ran 1 test in 1.625s 2023-01-11T22:10:24.5656935Z 2023-01-11T22:10:24.5657044Z OK (skipped=1) 2023-01-11T22:10:24.5657064Z 2023-01-11T22:10:24.5657184Z Generating XML reports... 2023-01-11T22:10:24.5657706Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220456.xml 2023-01-11T22:10:24.5658081Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5658239Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5658604Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5658789Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5658809Z 2023-01-11T22:10:24.5658911Z Running tests... 2023-01-11T22:10:24.5659168Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5659475Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5659751Z test_output_unused_in_loss_tuple_module (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5659968Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45356 2023-01-11T22:10:24.5660167Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45357 2023-01-11T22:10:24.5660530Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5660695Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5661062Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5661248Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5661603Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5661775Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5662147Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5662327Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5662553Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5662785Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5663177Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5663562Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5663787Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5664078Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5664350Z [1673474705.454943] [7c5487d9c02b:45357:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5664577Z [1673474705.468501] [7c5487d9c02b:45357:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5664806Z [1673474705.468501] [7c5487d9c02b:45357:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5665057Z [1673474705.454181] [7c5487d9c02b:45356:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5665279Z [1673474705.468015] [7c5487d9c02b:45356:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5665505Z [1673474705.468015] [7c5487d9c02b:45356:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5665603Z ok (6.632s) 2023-01-11T22:10:24.5665624Z 2023-01-11T22:10:24.5665932Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5666040Z Ran 1 test in 6.632s 2023-01-11T22:10:24.5666060Z 2023-01-11T22:10:24.5666149Z OK 2023-01-11T22:10:24.5666169Z 2023-01-11T22:10:24.5666290Z Generating XML reports... 2023-01-11T22:10:24.5666730Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220500.xml 2023-01-11T22:10:24.5667077Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5667254Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5667626Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5667817Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5667836Z 2023-01-11T22:10:24.5667940Z Running tests... 2023-01-11T22:10:24.5668200Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5668508Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5668773Z test_periodic_model_averager (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5668990Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45474 2023-01-11T22:10:24.5669190Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45475 2023-01-11T22:10:24.5669552Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5669720Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5670094Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5670283Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5670648Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5670816Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5671185Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5671354Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5671594Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5671835Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5672227Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5672678Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5672903Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5673126Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5673394Z [1673474715.499548] [7c5487d9c02b:45474:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5673659Z [1673474715.508665] [7c5487d9c02b:45475:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5673884Z [1673474715.513518] [7c5487d9c02b:45474:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5674105Z [1673474715.513518] [7c5487d9c02b:45474:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5674383Z [1673474715.520131] [7c5487d9c02b:45475:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5674612Z [1673474715.520131] [7c5487d9c02b:45475:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5674710Z ok (7.112s) 2023-01-11T22:10:24.5674730Z 2023-01-11T22:10:24.5674998Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5675108Z Ran 1 test in 7.112s 2023-01-11T22:10:24.5675128Z 2023-01-11T22:10:24.5675217Z OK 2023-01-11T22:10:24.5675236Z 2023-01-11T22:10:24.5675353Z Generating XML reports... 2023-01-11T22:10:24.5675791Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220509.xml 2023-01-11T22:10:24.5676148Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5676318Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5676695Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5676883Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5676903Z 2023-01-11T22:10:24.5677009Z Running tests... 2023-01-11T22:10:24.5677265Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5677570Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5677847Z test_periodic_model_averager_param_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5678046Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45589 2023-01-11T22:10:24.5678261Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45590 2023-01-11T22:10:24.5678629Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5678798Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5679173Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5679357Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5679713Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5679879Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5680243Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5680470Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5680705Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5680944Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5681339Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5681728Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5681948Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5682169Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5682434Z [1673474725.206191] [7c5487d9c02b:45589:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5682702Z [1673474725.215489] [7c5487d9c02b:45590:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5682959Z [1673474725.219904] [7c5487d9c02b:45589:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5683197Z [1673474725.219904] [7c5487d9c02b:45589:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5683412Z [1673474725.226906] [7c5487d9c02b:45590:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5683634Z [1673474725.226906] [7c5487d9c02b:45590:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5683727Z ok (7.212s) 2023-01-11T22:10:24.5683747Z 2023-01-11T22:10:24.5684009Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5684123Z Ran 1 test in 7.213s 2023-01-11T22:10:24.5684143Z 2023-01-11T22:10:24.5684232Z OK 2023-01-11T22:10:24.5684251Z 2023-01-11T22:10:24.5684370Z Generating XML reports... 2023-01-11T22:10:24.5684796Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220518.xml 2023-01-11T22:10:24.5685156Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5685330Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5685702Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5685887Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5685906Z 2023-01-11T22:10:24.5686012Z Running tests... 2023-01-11T22:10:24.5686266Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5686573Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5686849Z test_post_localSGD_optimizer_parity (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5687573Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77123 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.639s) 2023-01-11T22:10:24.5687607Z 2023-01-11T22:10:24.5687850Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5687957Z Ran 1 test in 1.639s 2023-01-11T22:10:24.5687977Z 2023-01-11T22:10:24.5688084Z OK (skipped=1) 2023-01-11T22:10:24.5688103Z 2023-01-11T22:10:24.5688222Z Generating XML reports... 2023-01-11T22:10:24.5688657Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220528.xml 2023-01-11T22:10:24.5689084Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5689256Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5689627Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5689808Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5689829Z 2023-01-11T22:10:24.5689919Z Running tests... 2023-01-11T22:10:24.5690171Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5690473Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5690759Z test_post_localSGD_optimizer_parity_grad_is_view (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5691535Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77292 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.596s) 2023-01-11T22:10:24.5691558Z 2023-01-11T22:10:24.5691817Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5691924Z Ran 1 test in 1.596s 2023-01-11T22:10:24.5691943Z 2023-01-11T22:10:24.5692042Z OK (skipped=1) 2023-01-11T22:10:24.5692061Z 2023-01-11T22:10:24.5692178Z Generating XML reports... 2023-01-11T22:10:24.5692600Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220532.xml 2023-01-11T22:10:24.5692963Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5693133Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5693508Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5693698Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5693718Z 2023-01-11T22:10:24.5693824Z Running tests... 2023-01-11T22:10:24.5694078Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5694377Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5694680Z test_post_localSGD_optimizer_parity_with_hierarchical_sgd (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5694882Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45772 2023-01-11T22:10:24.5695089Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45773 2023-01-11T22:10:24.5695448Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5695622Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5695992Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5696170Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5696755Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5696946Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5697313Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5697495Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5697729Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5698059Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5698457Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5698840Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5699063Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5699283Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5699426Z skip: Need at least 4 CUDA devices (4.269s) 2023-01-11T22:10:24.5699446Z 2023-01-11T22:10:24.5699690Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5699799Z Ran 1 test in 4.269s 2023-01-11T22:10:24.5699818Z 2023-01-11T22:10:24.5699919Z OK (skipped=1) 2023-01-11T22:10:24.5699942Z 2023-01-11T22:10:24.5700062Z Generating XML reports... 2023-01-11T22:10:24.5700569Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220537.xml 2023-01-11T22:10:24.5700942Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5701114Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5701485Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5701669Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5701690Z 2023-01-11T22:10:24.5701781Z Running tests... 2023-01-11T22:10:24.5702033Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5702336Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5702660Z test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5702873Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45875 2023-01-11T22:10:24.5703082Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45876 2023-01-11T22:10:24.5703440Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5703612Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5703967Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5704148Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5704508Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5704678Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5705046Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5705228Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5705466Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5705703Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5706090Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5706463Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5706682Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5706962Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5707105Z skip: Need at least 4 CUDA devices (4.311s) 2023-01-11T22:10:24.5707125Z 2023-01-11T22:10:24.5707389Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5707496Z Ran 1 test in 4.312s 2023-01-11T22:10:24.5707516Z 2023-01-11T22:10:24.5707617Z OK (skipped=1) 2023-01-11T22:10:24.5707636Z 2023-01-11T22:10:24.5707801Z Generating XML reports... 2023-01-11T22:10:24.5708247Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220543.xml 2023-01-11T22:10:24.5708595Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5708769Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5709138Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5709325Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5709345Z 2023-01-11T22:10:24.5709501Z Running tests... 2023-01-11T22:10:24.5709765Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5710070Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5710349Z test_post_localSGD_optimizer_step_reload (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5711083Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/84886 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.612s) 2023-01-11T22:10:24.5711105Z 2023-01-11T22:10:24.5711359Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5711458Z Ran 1 test in 1.613s 2023-01-11T22:10:24.5711478Z 2023-01-11T22:10:24.5711582Z OK (skipped=1) 2023-01-11T22:10:24.5711601Z 2023-01-11T22:10:24.5711724Z Generating XML reports... 2023-01-11T22:10:24.5712165Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220550.xml 2023-01-11T22:10:24.5712531Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5712702Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5713074Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5713259Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5713279Z 2023-01-11T22:10:24.5713370Z Running tests... 2023-01-11T22:10:24.5713621Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5713931Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5714189Z test_reduce_full_group_max (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5714403Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46012 2023-01-11T22:10:24.5714609Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46013 2023-01-11T22:10:24.5714966Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5715130Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5715502Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5715672Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5716094Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5716266Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5716638Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5716820Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5717056Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5717292Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5717683Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5718055Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5718281Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5718558Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:10:24.5718780Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5719006Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:10:24.5719395Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.5719782Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.5720113Z STAGE:2023-01-11 22:05:58 46013:46013 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5720427Z STAGE:2023-01-11 22:05:58 46012:46012 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5720710Z [1673474758.833193] [7c5487d9c02b:46013:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5720925Z [1673474760.455679] [7c5487d9c02b:46013:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5721156Z [1673474760.455679] [7c5487d9c02b:46013:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5721418Z [1673474758.812223] [7c5487d9c02b:46012:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5721639Z [1673474760.482602] [7c5487d9c02b:46012:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5721865Z [1673474760.482602] [7c5487d9c02b:46012:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5722415Z STAGE:2023-01-11 22:06:00 46013:46013 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 22:06:00 46012:46012 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5722436Z 2023-01-11T22:10:24.5722773Z STAGE:2023-01-11 22:06:00 46013:46013 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5723113Z STAGE:2023-01-11 22:06:00 46012:46012 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5723434Z STAGE:2023-01-11 22:06:00 46013:46013 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5723747Z STAGE:2023-01-11 22:06:00 46012:46012 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5724060Z STAGE:2023-01-11 22:06:00 46013:46013 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5724392Z STAGE:2023-01-11 22:06:00 46013:46013 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5724814Z STAGE:2023-01-11 22:06:00 46012:46012 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5725148Z STAGE:2023-01-11 22:06:00 46012:46012 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5725243Z ok (6.650s) 2023-01-11T22:10:24.5725261Z 2023-01-11T22:10:24.5725515Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5725624Z Ran 1 test in 6.650s 2023-01-11T22:10:24.5725643Z 2023-01-11T22:10:24.5725730Z OK 2023-01-11T22:10:24.5725748Z 2023-01-11T22:10:24.5725869Z Generating XML reports... 2023-01-11T22:10:24.5726298Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220554.xml 2023-01-11T22:10:24.5726661Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5726841Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5727259Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5727448Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5727468Z 2023-01-11T22:10:24.5727567Z Running tests... 2023-01-11T22:10:24.5727825Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5728125Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5728368Z test_reduce_full_group_min (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5728582Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46126 2023-01-11T22:10:24.5728795Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46127 2023-01-11T22:10:24.5729158Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5729334Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5729704Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5729883Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5730240Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5730405Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5730763Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5730954Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5731191Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5731430Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5731826Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5732213Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5732434Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5732665Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:10:24.5732880Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5733094Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:10:24.5733478Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.5733918Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.5734243Z STAGE:2023-01-11 22:06:08 46127:46127 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5734563Z STAGE:2023-01-11 22:06:08 46126:46126 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5734836Z [1673474768.057584] [7c5487d9c02b:46127:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5735064Z [1673474769.703668] [7c5487d9c02b:46127:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5735295Z [1673474769.703668] [7c5487d9c02b:46127:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5735556Z [1673474768.037568] [7c5487d9c02b:46126:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5735844Z [1673474769.668922] [7c5487d9c02b:46126:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5736068Z [1673474769.668922] [7c5487d9c02b:46126:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5736849Z STAGE:2023-01-11 22:06:10 46127:46127 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 22:06:10 46126:46126 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5736873Z 2023-01-11T22:10:24.5737236Z STAGE:2023-01-11 22:06:10 46127:46127 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5737578Z STAGE:2023-01-11 22:06:10 46126:46126 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5737906Z STAGE:2023-01-11 22:06:10 46126:46126 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5738236Z STAGE:2023-01-11 22:06:10 46126:46126 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5738569Z STAGE:2023-01-11 22:06:10 46126:46126 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5738889Z STAGE:2023-01-11 22:06:10 46127:46127 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5739213Z STAGE:2023-01-11 22:06:10 46127:46127 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5739545Z STAGE:2023-01-11 22:06:10 46127:46127 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5739631Z ok (6.754s) 2023-01-11T22:10:24.5739651Z 2023-01-11T22:10:24.5739914Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5740026Z Ran 1 test in 6.754s 2023-01-11T22:10:24.5740048Z 2023-01-11T22:10:24.5740134Z OK 2023-01-11T22:10:24.5740152Z 2023-01-11T22:10:24.5740267Z Generating XML reports... 2023-01-11T22:10:24.5740711Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220604.xml 2023-01-11T22:10:24.5741082Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5741255Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5741610Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5741794Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5741813Z 2023-01-11T22:10:24.5741917Z Running tests... 2023-01-11T22:10:24.5742168Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5742475Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5742831Z test_reduce_full_group_product (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5743054Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46240 2023-01-11T22:10:24.5743265Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46241 2023-01-11T22:10:24.5743623Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5743794Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5744164Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5744344Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5744701Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5744874Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5745300Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5745490Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5745730Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5745957Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5746355Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5746741Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5746965Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5747203Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:10:24.5747424Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5747653Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:10:24.5748045Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.5748421Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.5748737Z STAGE:2023-01-11 22:06:17 46241:46241 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5749054Z STAGE:2023-01-11 22:06:17 46240:46240 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5749324Z [1673474777.391886] [7c5487d9c02b:46241:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5749555Z [1673474779.052666] [7c5487d9c02b:46241:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5749790Z [1673474779.052666] [7c5487d9c02b:46241:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5750056Z [1673474777.371778] [7c5487d9c02b:46240:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5750277Z [1673474779.003643] [7c5487d9c02b:46240:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5750506Z [1673474779.003643] [7c5487d9c02b:46240:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5751050Z STAGE:2023-01-11 22:06:19 46241:46241 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 22:06:19 46240:46240 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5751123Z 2023-01-11T22:10:24.5751476Z STAGE:2023-01-11 22:06:19 46241:46241 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5751816Z STAGE:2023-01-11 22:06:19 46240:46240 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5752124Z STAGE:2023-01-11 22:06:19 46241:46241 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5752436Z STAGE:2023-01-11 22:06:19 46240:46240 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5752760Z STAGE:2023-01-11 22:06:19 46241:46241 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5753076Z STAGE:2023-01-11 22:06:19 46240:46240 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5753415Z STAGE:2023-01-11 22:06:19 46241:46241 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5753796Z STAGE:2023-01-11 22:06:19 46240:46240 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5753896Z ok (6.643s) 2023-01-11T22:10:24.5753916Z 2023-01-11T22:10:24.5754175Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5754277Z Ran 1 test in 6.643s 2023-01-11T22:10:24.5754297Z 2023-01-11T22:10:24.5754371Z OK 2023-01-11T22:10:24.5754390Z 2023-01-11T22:10:24.5754504Z Generating XML reports... 2023-01-11T22:10:24.5754944Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220613.xml 2023-01-11T22:10:24.5755308Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5755480Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5755853Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5756041Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5756060Z 2023-01-11T22:10:24.5756165Z Running tests... 2023-01-11T22:10:24.5756407Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5756716Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5756973Z test_reduce_full_group_sum (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5757186Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46354 2023-01-11T22:10:24.5757394Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46355 2023-01-11T22:10:24.5757752Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5757924Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5758295Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5758479Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5758823Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5758986Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5759350Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5759532Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5759770Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5760011Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5760471Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5760863Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5761085Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5761305Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:10:24.5761519Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5761742Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:10:24.5762123Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.5762453Z STAGE:2023-01-11 22:06:26 46355:46355 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5762946Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.5763282Z STAGE:2023-01-11 22:06:26 46354:46354 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5763557Z [1673474786.465649] [7c5487d9c02b:46354:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5763785Z [1673474788.089974] [7c5487d9c02b:46354:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5764003Z [1673474788.089974] [7c5487d9c02b:46354:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5764269Z [1673474786.485794] [7c5487d9c02b:46355:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5764506Z [1673474788.132183] [7c5487d9c02b:46355:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5764740Z [1673474788.132183] [7c5487d9c02b:46355:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5765293Z STAGE:2023-01-11 22:06:28 46354:46354 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 22:06:28 46355:46355 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5765314Z 2023-01-11T22:10:24.5765656Z STAGE:2023-01-11 22:06:28 46355:46355 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5765990Z STAGE:2023-01-11 22:06:28 46354:46354 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5766304Z STAGE:2023-01-11 22:06:28 46355:46355 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5766620Z STAGE:2023-01-11 22:06:28 46354:46354 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5766942Z STAGE:2023-01-11 22:06:28 46355:46355 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5767249Z STAGE:2023-01-11 22:06:28 46354:46354 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5767581Z STAGE:2023-01-11 22:06:28 46355:46355 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5767913Z STAGE:2023-01-11 22:06:28 46354:46354 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5768017Z ok (6.619s) 2023-01-11T22:10:24.5768037Z 2023-01-11T22:10:24.5768299Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5768412Z Ran 1 test in 6.620s 2023-01-11T22:10:24.5768432Z 2023-01-11T22:10:24.5768523Z OK 2023-01-11T22:10:24.5768595Z 2023-01-11T22:10:24.5768722Z Generating XML reports... 2023-01-11T22:10:24.5769167Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220622.xml 2023-01-11T22:10:24.5769518Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5769688Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5770064Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5770251Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5770271Z 2023-01-11T22:10:24.5770379Z Running tests... 2023-01-11T22:10:24.5770640Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5770942Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5771197Z test_reduce_group_max (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5771398Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46468 2023-01-11T22:10:24.5771660Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46469 2023-01-11T22:10:24.5772035Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5772208Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5772576Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5772759Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5773118Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5773287Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5773660Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5773834Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5774071Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5774308Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5774695Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5775083Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5775304Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5775523Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5775677Z skip: Skipped due to small world size. (4.243s) 2023-01-11T22:10:24.5775697Z 2023-01-11T22:10:24.5775957Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5776052Z Ran 1 test in 4.243s 2023-01-11T22:10:24.5776071Z 2023-01-11T22:10:24.5776172Z OK (skipped=1) 2023-01-11T22:10:24.5776191Z 2023-01-11T22:10:24.5776309Z Generating XML reports... 2023-01-11T22:10:24.5776979Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220631.xml 2023-01-11T22:10:24.5777364Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5777538Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5777907Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5778184Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5778203Z 2023-01-11T22:10:24.5778301Z Running tests... 2023-01-11T22:10:24.5778555Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5778857Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5779101Z test_reduce_group_min (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5779308Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46571 2023-01-11T22:10:24.5779512Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46572 2023-01-11T22:10:24.5779867Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5780029Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5780393Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5780566Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5780978Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5781149Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5781520Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5781704Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5781941Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5782177Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5782567Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5782959Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5783172Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5783393Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5783546Z skip: Skipped due to small world size. (4.256s) 2023-01-11T22:10:24.5783566Z 2023-01-11T22:10:24.5783827Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5783935Z Ran 1 test in 4.256s 2023-01-11T22:10:24.5783955Z 2023-01-11T22:10:24.5784059Z OK (skipped=1) 2023-01-11T22:10:24.5784079Z 2023-01-11T22:10:24.5784202Z Generating XML reports... 2023-01-11T22:10:24.5784635Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220638.xml 2023-01-11T22:10:24.5784987Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5785162Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5785526Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5785707Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5785726Z 2023-01-11T22:10:24.5785829Z Running tests... 2023-01-11T22:10:24.5786079Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5786384Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5786642Z test_reduce_group_product (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5786857Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46674 2023-01-11T22:10:24.5787114Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46675 2023-01-11T22:10:24.5787481Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5787653Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5788025Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5788208Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5788568Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5788735Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5789103Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5789277Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5789514Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5789801Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5790200Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5790590Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5790817Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5791041Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5791198Z skip: Skipped due to small world size. (4.272s) 2023-01-11T22:10:24.5791218Z 2023-01-11T22:10:24.5791484Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5791577Z Ran 1 test in 4.273s 2023-01-11T22:10:24.5791597Z 2023-01-11T22:10:24.5791700Z OK (skipped=1) 2023-01-11T22:10:24.5791719Z 2023-01-11T22:10:24.5791841Z Generating XML reports... 2023-01-11T22:10:24.5792281Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220645.xml 2023-01-11T22:10:24.5792646Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5792817Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5793185Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5793364Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5793384Z 2023-01-11T22:10:24.5793491Z Running tests... 2023-01-11T22:10:24.5793738Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5794038Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5794288Z test_reduce_group_sum (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5794501Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46777 2023-01-11T22:10:24.5794709Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46778 2023-01-11T22:10:24.5795071Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5795237Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5795600Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5795769Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5796187Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5796355Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5796728Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5796911Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5797147Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5797387Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5797778Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5798165Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5798379Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5798658Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5798814Z skip: Skipped due to small world size. (4.262s) 2023-01-11T22:10:24.5798834Z 2023-01-11T22:10:24.5799092Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5799193Z Ran 1 test in 4.262s 2023-01-11T22:10:24.5799213Z 2023-01-11T22:10:24.5799316Z OK (skipped=1) 2023-01-11T22:10:24.5799335Z 2023-01-11T22:10:24.5799454Z Generating XML reports... 2023-01-11T22:10:24.5799897Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220652.xml 2023-01-11T22:10:24.5800265Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5800428Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5800797Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5800984Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5801004Z 2023-01-11T22:10:24.5801108Z Running tests... 2023-01-11T22:10:24.5801365Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5801663Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5801903Z test_reduce_max (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5802118Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46880 2023-01-11T22:10:24.5802314Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46881 2023-01-11T22:10:24.5802675Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5802851Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5803222Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5803404Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5803756Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5803925Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5804291Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5804471Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5804697Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5804992Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5805388Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5805773Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5805991Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5806208Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5806535Z STAGE:2023-01-11 22:07:02 46881:46881 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5806855Z STAGE:2023-01-11 22:07:02 46880:46880 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5807127Z [1673474822.939321] [7c5487d9c02b:46881:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5807391Z [1673474824.560300] [7c5487d9c02b:46881:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5807636Z [1673474824.560300] [7c5487d9c02b:46881:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5807959Z [1673474822.918518] [7c5487d9c02b:46880:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5808188Z [1673474824.589870] [7c5487d9c02b:46880:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5808421Z [1673474824.589870] [7c5487d9c02b:46880:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5808970Z STAGE:2023-01-11 22:07:04 46881:46881 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 22:07:04 46880:46880 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5808996Z 2023-01-11T22:10:24.5809343Z STAGE:2023-01-11 22:07:04 46881:46881 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5809683Z STAGE:2023-01-11 22:07:04 46880:46880 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5810005Z STAGE:2023-01-11 22:07:05 46881:46881 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5810319Z STAGE:2023-01-11 22:07:05 46880:46880 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5810630Z STAGE:2023-01-11 22:07:05 46881:46881 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5810953Z STAGE:2023-01-11 22:07:05 46880:46880 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5811291Z STAGE:2023-01-11 22:07:05 46881:46881 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5811627Z STAGE:2023-01-11 22:07:05 46880:46880 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5811726Z ok (6.673s) 2023-01-11T22:10:24.5811745Z 2023-01-11T22:10:24.5812001Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5812111Z Ran 1 test in 6.673s 2023-01-11T22:10:24.5812131Z 2023-01-11T22:10:24.5812219Z OK 2023-01-11T22:10:24.5812239Z 2023-01-11T22:10:24.5812356Z Generating XML reports... 2023-01-11T22:10:24.5812782Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220659.xml 2023-01-11T22:10:24.5813142Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5813311Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5813679Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5813921Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5813941Z 2023-01-11T22:10:24.5814044Z Running tests... 2023-01-11T22:10:24.5814303Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5814604Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5814832Z test_reduce_min (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5815044Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46994 2023-01-11T22:10:24.5815249Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46995 2023-01-11T22:10:24.5815612Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5815783Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5816153Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5816380Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5816994Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5817167Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5817533Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5817712Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5817946Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5818177Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5818573Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5818961Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5819184Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5819513Z STAGE:2023-01-11 22:07:12 46994:46994 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5819735Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5820046Z STAGE:2023-01-11 22:07:12 46995:46995 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5820315Z [1673474832.198159] [7c5487d9c02b:46995:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5820542Z [1673474833.858972] [7c5487d9c02b:46995:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5820778Z [1673474833.858972] [7c5487d9c02b:46995:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5821048Z [1673474832.178160] [7c5487d9c02b:46994:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5821269Z [1673474833.821248] [7c5487d9c02b:46994:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5821501Z [1673474833.821248] [7c5487d9c02b:46994:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5822039Z STAGE:2023-01-11 22:07:14 46995:46995 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 22:07:14 46994:46994 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5822145Z 2023-01-11T22:10:24.5822498Z STAGE:2023-01-11 22:07:14 46995:46995 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5822839Z STAGE:2023-01-11 22:07:14 46994:46994 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5823144Z STAGE:2023-01-11 22:07:14 46994:46994 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5823470Z STAGE:2023-01-11 22:07:14 46994:46994 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5823801Z STAGE:2023-01-11 22:07:14 46994:46994 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5824121Z STAGE:2023-01-11 22:07:14 46995:46995 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5824449Z STAGE:2023-01-11 22:07:14 46995:46995 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5824785Z STAGE:2023-01-11 22:07:14 46995:46995 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5824891Z ok (6.727s) 2023-01-11T22:10:24.5824910Z 2023-01-11T22:10:24.5825237Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5825352Z Ran 1 test in 6.727s 2023-01-11T22:10:24.5825372Z 2023-01-11T22:10:24.5825446Z OK 2023-01-11T22:10:24.5825465Z 2023-01-11T22:10:24.5825583Z Generating XML reports... 2023-01-11T22:10:24.5826027Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220708.xml 2023-01-11T22:10:24.5826389Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5826558Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5826927Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5827116Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5827136Z 2023-01-11T22:10:24.5827237Z Running tests... 2023-01-11T22:10:24.5827490Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5827781Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5828047Z test_reduce_multigpu (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl backend supports reduce multigpu (0.002s) 2023-01-11T22:10:24.5828067Z 2023-01-11T22:10:24.5828319Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5828425Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5828445Z 2023-01-11T22:10:24.5828551Z OK (skipped=1) 2023-01-11T22:10:24.5828570Z 2023-01-11T22:10:24.5828689Z Generating XML reports... 2023-01-11T22:10:24.5829123Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220717.xml 2023-01-11T22:10:24.5829490Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5829661Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5830023Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5830206Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5830226Z 2023-01-11T22:10:24.5830332Z Running tests... 2023-01-11T22:10:24.5830594Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5830901Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5831154Z test_reduce_product (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5831371Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47141 2023-01-11T22:10:24.5831639Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47142 2023-01-11T22:10:24.5831994Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5832163Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5832533Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5832715Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5833070Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5833239Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5833606Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5833786Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5834015Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5834301Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5834701Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5835088Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5835314Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5835643Z STAGE:2023-01-11 22:07:23 47142:47142 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5835865Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5836190Z STAGE:2023-01-11 22:07:23 47141:47141 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5836467Z [1673474843.703259] [7c5487d9c02b:47141:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5836692Z [1673474845.360430] [7c5487d9c02b:47141:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5836910Z [1673474845.360430] [7c5487d9c02b:47141:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5837174Z [1673474843.723193] [7c5487d9c02b:47142:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5837395Z [1673474845.351202] [7c5487d9c02b:47142:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5837623Z [1673474845.351202] [7c5487d9c02b:47142:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5838168Z STAGE:2023-01-11 22:07:25 47141:47141 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 22:07:25 47142:47142 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5838189Z 2023-01-11T22:10:24.5838528Z STAGE:2023-01-11 22:07:25 47141:47141 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5838862Z STAGE:2023-01-11 22:07:25 47142:47142 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5839177Z STAGE:2023-01-11 22:07:25 47142:47142 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5839490Z STAGE:2023-01-11 22:07:25 47141:47141 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5839814Z STAGE:2023-01-11 22:07:25 47142:47142 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5840177Z STAGE:2023-01-11 22:07:25 47141:47141 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5840514Z STAGE:2023-01-11 22:07:25 47142:47142 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5840841Z STAGE:2023-01-11 22:07:25 47141:47141 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5840938Z ok (6.530s) 2023-01-11T22:10:24.5840958Z 2023-01-11T22:10:24.5841213Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5841318Z Ran 1 test in 6.530s 2023-01-11T22:10:24.5841338Z 2023-01-11T22:10:24.5841424Z OK 2023-01-11T22:10:24.5841444Z 2023-01-11T22:10:24.5841564Z Generating XML reports... 2023-01-11T22:10:24.5842006Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220719.xml 2023-01-11T22:10:24.5842357Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5842532Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5842952Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5843141Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5843161Z 2023-01-11T22:10:24.5843266Z Running tests... 2023-01-11T22:10:24.5843528Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5843830Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5844118Z test_reduce_scatter_tensor_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA reduce_scatter_tensor (0.002s) 2023-01-11T22:10:24.5844137Z 2023-01-11T22:10:24.5844391Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5844485Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5844509Z 2023-01-11T22:10:24.5844614Z OK (skipped=1) 2023-01-11T22:10:24.5844634Z 2023-01-11T22:10:24.5844753Z Generating XML reports... 2023-01-11T22:10:24.5845194Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220728.xml 2023-01-11T22:10:24.5845557Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5845729Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5846099Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5846285Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5846305Z 2023-01-11T22:10:24.5846396Z Running tests... 2023-01-11T22:10:24.5846657Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5846962Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5847232Z test_reduce_scatter_v_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports reduce_scatter_v (0.003s) 2023-01-11T22:10:24.5847255Z 2023-01-11T22:10:24.5847511Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5847620Z Ran 1 test in 0.003s 2023-01-11T22:10:24.5847639Z 2023-01-11T22:10:24.5847741Z OK (skipped=1) 2023-01-11T22:10:24.5847761Z 2023-01-11T22:10:24.5847879Z Generating XML reports... 2023-01-11T22:10:24.5848312Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220731.xml 2023-01-11T22:10:24.5848663Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5848829Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5849201Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5849440Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5849460Z 2023-01-11T22:10:24.5849569Z Running tests... 2023-01-11T22:10:24.5849823Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5850125Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5850366Z test_reduce_sum (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5850566Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47321 2023-01-11T22:10:24.5850780Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47322 2023-01-11T22:10:24.5851142Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5851315Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5851692Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5851919Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5852284Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5852453Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5852826Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5852994Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5853233Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5853477Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5853877Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5854266Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5854497Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5854831Z STAGE:2023-01-11 22:07:37 47321:47321 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5855058Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5855385Z STAGE:2023-01-11 22:07:37 47322:47322 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5855639Z [1673474857.524959] [7c5487d9c02b:47321:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5855865Z [1673474859.159904] [7c5487d9c02b:47321:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5856102Z [1673474859.159904] [7c5487d9c02b:47321:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5856367Z [1673474857.544885] [7c5487d9c02b:47322:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5856820Z [1673474859.203986] [7c5487d9c02b:47322:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5857062Z [1673474859.203986] [7c5487d9c02b:47322:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5857615Z STAGE:2023-01-11 22:07:39 47321:47321 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 22:07:39 47322:47322 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5857719Z 2023-01-11T22:10:24.5858071Z STAGE:2023-01-11 22:07:39 47321:47321 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5858415Z STAGE:2023-01-11 22:07:39 47322:47322 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5858739Z STAGE:2023-01-11 22:07:39 47322:47322 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5859060Z STAGE:2023-01-11 22:07:39 47321:47321 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5859372Z STAGE:2023-01-11 22:07:39 47322:47322 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5859919Z STAGE:2023-01-11 22:07:39 47322:47322 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 22:07:39 47321:47321 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5859940Z 2023-01-11T22:10:24.5860272Z STAGE:2023-01-11 22:07:39 47321:47321 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5860376Z ok (6.610s) 2023-01-11T22:10:24.5860396Z 2023-01-11T22:10:24.5860654Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5860857Z Ran 1 test in 6.610s 2023-01-11T22:10:24.5860878Z 2023-01-11T22:10:24.5860969Z OK 2023-01-11T22:10:24.5860989Z 2023-01-11T22:10:24.5861111Z Generating XML reports... 2023-01-11T22:10:24.5861557Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220733.xml 2023-01-11T22:10:24.5861906Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5862071Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5862439Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5862620Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5862645Z 2023-01-11T22:10:24.5862749Z Running tests... 2023-01-11T22:10:24.5863010Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5863322Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5863576Z test_reduce_sum_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA reduce (0.002s) 2023-01-11T22:10:24.5863596Z 2023-01-11T22:10:24.5863854Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5863947Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5863967Z 2023-01-11T22:10:24.5864070Z OK (skipped=1) 2023-01-11T22:10:24.5864089Z 2023-01-11T22:10:24.5864206Z Generating XML reports... 2023-01-11T22:10:24.5864644Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220742.xml 2023-01-11T22:10:24.5865003Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5865178Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5865551Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5865735Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5865755Z 2023-01-11T22:10:24.5865846Z Running tests... 2023-01-11T22:10:24.5866103Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5866403Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5866659Z test_reduce_sum_cuda_twice (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA reduce (0.002s) 2023-01-11T22:10:24.5866679Z 2023-01-11T22:10:24.5866931Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5867088Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5867107Z 2023-01-11T22:10:24.5867209Z OK (skipped=1) 2023-01-11T22:10:24.5867228Z 2023-01-11T22:10:24.5867342Z Generating XML reports... 2023-01-11T22:10:24.5867784Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220745.xml 2023-01-11T22:10:24.5868135Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5868309Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5868680Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5868867Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5868886Z 2023-01-11T22:10:24.5868985Z Running tests... 2023-01-11T22:10:24.5869239Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5869540Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5869828Z test_reduce_sum_twice (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5870046Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47501 2023-01-11T22:10:24.5870245Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47502 2023-01-11T22:10:24.5870609Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5870778Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5871142Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5871324Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5871686Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5871860Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5872232Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5872403Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5872640Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5872876Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5873267Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5873654Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5873874Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5874208Z STAGE:2023-01-11 22:07:51 47501:47501 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5874434Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5874755Z STAGE:2023-01-11 22:07:51 47502:47502 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5875012Z [1673474871.440188] [7c5487d9c02b:47501:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5875239Z [1673474873.073402] [7c5487d9c02b:47501:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5875470Z [1673474873.073402] [7c5487d9c02b:47501:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5875735Z [1673474871.460353] [7c5487d9c02b:47502:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5876009Z [1673474873.082008] [7c5487d9c02b:47502:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5876238Z [1673474873.082008] [7c5487d9c02b:47502:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5876786Z STAGE:2023-01-11 22:07:53 47501:47501 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 22:07:53 47502:47502 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5876807Z 2023-01-11T22:10:24.5877146Z STAGE:2023-01-11 22:07:53 47502:47502 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5877480Z STAGE:2023-01-11 22:07:53 47501:47501 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5877794Z STAGE:2023-01-11 22:07:53 47502:47502 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5878106Z STAGE:2023-01-11 22:07:53 47501:47501 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5878458Z STAGE:2023-01-11 22:07:53 47502:47502 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5878781Z STAGE:2023-01-11 22:07:53 47501:47501 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5879119Z STAGE:2023-01-11 22:07:53 47502:47502 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5879451Z STAGE:2023-01-11 22:07:53 47501:47501 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5879548Z ok (6.559s) 2023-01-11T22:10:24.5879567Z 2023-01-11T22:10:24.5879825Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5879934Z Ran 1 test in 6.560s 2023-01-11T22:10:24.5879954Z 2023-01-11T22:10:24.5880043Z OK 2023-01-11T22:10:24.5880062Z 2023-01-11T22:10:24.5880169Z Generating XML reports... 2023-01-11T22:10:24.5880609Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220747.xml 2023-01-11T22:10:24.5880973Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5881143Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5881512Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5881698Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5881717Z 2023-01-11T22:10:24.5881824Z Running tests... 2023-01-11T22:10:24.5882085Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5882384Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5882629Z test_scatter (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T22:10:24.5882649Z 2023-01-11T22:10:24.5882903Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5883013Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5883033Z 2023-01-11T22:10:24.5883138Z OK (skipped=1) 2023-01-11T22:10:24.5883157Z 2023-01-11T22:10:24.5883277Z Generating XML reports... 2023-01-11T22:10:24.5883711Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220756.xml 2023-01-11T22:10:24.5884075Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5884249Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5884619Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5884846Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5884866Z 2023-01-11T22:10:24.5884970Z Running tests... 2023-01-11T22:10:24.5885234Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5885542Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5885801Z test_scatter_checks (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T22:10:24.5885821Z 2023-01-11T22:10:24.5886074Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5886181Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5886201Z 2023-01-11T22:10:24.5886304Z OK (skipped=1) 2023-01-11T22:10:24.5886323Z 2023-01-11T22:10:24.5886428Z Generating XML reports... 2023-01-11T22:10:24.5886861Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220759.xml 2023-01-11T22:10:24.5887224Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5887438Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5887814Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5887999Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5888019Z 2023-01-11T22:10:24.5888120Z Running tests... 2023-01-11T22:10:24.5888373Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5888671Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5888920Z test_scatter_complex (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T22:10:24.5888951Z 2023-01-11T22:10:24.5889191Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5889299Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5889319Z 2023-01-11T22:10:24.5889419Z OK (skipped=1) 2023-01-11T22:10:24.5889438Z 2023-01-11T22:10:24.5889561Z Generating XML reports... 2023-01-11T22:10:24.5889991Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220801.xml 2023-01-11T22:10:24.5890350Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5890515Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5890886Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5891057Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5891077Z 2023-01-11T22:10:24.5891182Z Running tests... 2023-01-11T22:10:24.5891435Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5891743Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5891998Z test_scatter_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA gather (0.002s) 2023-01-11T22:10:24.5892018Z 2023-01-11T22:10:24.5892274Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5892382Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5892401Z 2023-01-11T22:10:24.5892507Z OK (skipped=1) 2023-01-11T22:10:24.5892526Z 2023-01-11T22:10:24.5892647Z Generating XML reports... 2023-01-11T22:10:24.5893067Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220803.xml 2023-01-11T22:10:24.5893431Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5893603Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5894038Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5894223Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5894243Z 2023-01-11T22:10:24.5894346Z Running tests... 2023-01-11T22:10:24.5894597Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5894895Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5895139Z test_scatter_cuda_complex (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA gather (0.002s) 2023-01-11T22:10:24.5895167Z 2023-01-11T22:10:24.5895404Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5895508Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5895527Z 2023-01-11T22:10:24.5895629Z OK (skipped=1) 2023-01-11T22:10:24.5895648Z 2023-01-11T22:10:24.5895768Z Generating XML reports... 2023-01-11T22:10:24.5896198Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220806.xml 2023-01-11T22:10:24.5896852Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5897040Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5897424Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5897595Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5897624Z 2023-01-11T22:10:24.5897714Z Running tests... 2023-01-11T22:10:24.5897964Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5898262Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5898525Z test_scatter_full_group (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T22:10:24.5898550Z 2023-01-11T22:10:24.5898803Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5898914Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5898934Z 2023-01-11T22:10:24.5899036Z OK (skipped=1) 2023-01-11T22:10:24.5899055Z 2023-01-11T22:10:24.5899172Z Generating XML reports... 2023-01-11T22:10:24.5899596Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220808.xml 2023-01-11T22:10:24.5899952Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5900119Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5900491Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5900672Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5900696Z 2023-01-11T22:10:24.5900800Z Running tests... 2023-01-11T22:10:24.5901052Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5901353Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5901599Z test_scatter_group (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T22:10:24.5901629Z 2023-01-11T22:10:24.5901874Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5901980Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5901999Z 2023-01-11T22:10:24.5902095Z OK (skipped=1) 2023-01-11T22:10:24.5902114Z 2023-01-11T22:10:24.5902229Z Generating XML reports... 2023-01-11T22:10:24.5902659Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220811.xml 2023-01-11T22:10:24.5903015Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5903268Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5903643Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5903814Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5903842Z 2023-01-11T22:10:24.5903934Z Running tests... 2023-01-11T22:10:24.5904182Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5904480Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5904850Z test_scatter_object_list (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:10:24.5904870Z 2023-01-11T22:10:24.5905125Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5905237Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5905257Z 2023-01-11T22:10:24.5905357Z OK (skipped=1) 2023-01-11T22:10:24.5905377Z 2023-01-11T22:10:24.5905490Z Generating XML reports... 2023-01-11T22:10:24.5905954Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220813.xml 2023-01-11T22:10:24.5906317Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5906486Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5906850Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5907037Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5907056Z 2023-01-11T22:10:24.5907164Z Running tests... 2023-01-11T22:10:24.5907420Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5907724Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5908000Z test_send_recv (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5908214Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47879 2023-01-11T22:10:24.5908421Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47880 2023-01-11T22:10:24.5908785Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5908955Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5909324Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5909502Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5909859Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5910029Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5910388Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5910571Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5910807Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5911047Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5911443Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5911835Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5912062Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5912350Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5912623Z [1673474899.813201] [7c5487d9c02b:47880:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5912837Z [1673474901.232846] [7c5487d9c02b:47880:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5913074Z [1673474901.232846] [7c5487d9c02b:47880:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5913340Z [1673474899.812913] [7c5487d9c02b:47879:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5913564Z [1673474901.223473] [7c5487d9c02b:47879:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5913797Z [1673474901.223473] [7c5487d9c02b:47879:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5913889Z ok (6.137s) 2023-01-11T22:10:24.5913955Z 2023-01-11T22:10:24.5914225Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5914333Z Ran 1 test in 6.137s 2023-01-11T22:10:24.5914353Z 2023-01-11T22:10:24.5914439Z OK 2023-01-11T22:10:24.5914459Z 2023-01-11T22:10:24.5914565Z Generating XML reports... 2023-01-11T22:10:24.5914998Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220815.xml 2023-01-11T22:10:24.5915358Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5915528Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5915897Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5916082Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5916102Z 2023-01-11T22:10:24.5916206Z Running tests... 2023-01-11T22:10:24.5916466Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5916762Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5917022Z test_send_recv_any_source (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support send/recv from any source (0.002s) 2023-01-11T22:10:24.5917054Z 2023-01-11T22:10:24.5917293Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5917399Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5917419Z 2023-01-11T22:10:24.5917519Z OK (skipped=1) 2023-01-11T22:10:24.5917538Z 2023-01-11T22:10:24.5917655Z Generating XML reports... 2023-01-11T22:10:24.5918084Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220824.xml 2023-01-11T22:10:24.5918457Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5918635Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5919006Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5919176Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5919196Z 2023-01-11T22:10:24.5919299Z Running tests... 2023-01-11T22:10:24.5919555Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5919859Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5920166Z test_send_recv_any_source_autograd_profiler (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support send/recv from any source (0.002s) 2023-01-11T22:10:24.5920238Z 2023-01-11T22:10:24.5920496Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5920607Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5920630Z 2023-01-11T22:10:24.5920739Z OK (skipped=1) 2023-01-11T22:10:24.5920758Z 2023-01-11T22:10:24.5920877Z Generating XML reports... 2023-01-11T22:10:24.5921298Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220827.xml 2023-01-11T22:10:24.5921663Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5921834Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5922204Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5922391Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5922413Z 2023-01-11T22:10:24.5922516Z Running tests... 2023-01-11T22:10:24.5922774Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5923136Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5923423Z test_send_recv_any_source_torch_profiler (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support send/recv from any source (0.002s) 2023-01-11T22:10:24.5923458Z 2023-01-11T22:10:24.5923702Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5923809Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5923829Z 2023-01-11T22:10:24.5923928Z OK (skipped=1) 2023-01-11T22:10:24.5923948Z 2023-01-11T22:10:24.5924067Z Generating XML reports... 2023-01-11T22:10:24.5924495Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220829.xml 2023-01-11T22:10:24.5924850Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5925017Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5925382Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5925552Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5925580Z 2023-01-11T22:10:24.5925671Z Running tests... 2023-01-11T22:10:24.5925919Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5926216Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5926479Z test_send_recv_autograd_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5926689Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48088 2023-01-11T22:10:24.5926898Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48089 2023-01-11T22:10:24.5927265Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5927443Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5927800Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5927987Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5928346Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5928515Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5928878Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5929059Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5929354Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5929598Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5929979Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5930368Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5930589Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5930918Z STAGE:2023-01-11 22:08:35 48088:48088 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5931142Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5931464Z STAGE:2023-01-11 22:08:35 48089:48089 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5931737Z [1673474915.769259] [7c5487d9c02b:48089:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5932011Z [1673474917.443357] [7c5487d9c02b:48089:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5932252Z [1673474917.443357] [7c5487d9c02b:48089:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5932586Z STAGE:2023-01-11 22:08:37 48089:48089 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5932835Z [1673474915.769013] [7c5487d9c02b:48088:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5933055Z [1673474917.417825] [7c5487d9c02b:48088:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5933284Z [1673474917.417825] [7c5487d9c02b:48088:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5933620Z STAGE:2023-01-11 22:08:37 48088:48088 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5933963Z STAGE:2023-01-11 22:08:37 48089:48089 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5934300Z STAGE:2023-01-11 22:08:37 48088:48088 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5934394Z ok (6.775s) 2023-01-11T22:10:24.5934414Z 2023-01-11T22:10:24.5934669Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5934776Z Ran 1 test in 6.775s 2023-01-11T22:10:24.5934796Z 2023-01-11T22:10:24.5934871Z OK 2023-01-11T22:10:24.5934890Z 2023-01-11T22:10:24.5935008Z Generating XML reports... 2023-01-11T22:10:24.5935446Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220831.xml 2023-01-11T22:10:24.5935813Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5935985Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5936356Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5936775Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5936802Z 2023-01-11T22:10:24.5936914Z Running tests... 2023-01-11T22:10:24.5937179Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5937475Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5937706Z test_send_recv_nccl (__main__.TestDistBackendWithSpawn) ... skip: NCCL Send Recv Only (0.002s) 2023-01-11T22:10:24.5937726Z 2023-01-11T22:10:24.5938073Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5938180Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5938200Z 2023-01-11T22:10:24.5938302Z OK (skipped=1) 2023-01-11T22:10:24.5938325Z 2023-01-11T22:10:24.5938441Z Generating XML reports... 2023-01-11T22:10:24.5938881Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220841.xml 2023-01-11T22:10:24.5939248Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5939404Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5939775Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5939956Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5939977Z 2023-01-11T22:10:24.5940078Z Running tests... 2023-01-11T22:10:24.5940332Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5940630Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5940942Z test_send_recv_nccl_autograd_profiler (__main__.TestDistBackendWithSpawn) ... skip: NCCL Send Recv Only (0.002s) 2023-01-11T22:10:24.5940966Z 2023-01-11T22:10:24.5941223Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5941327Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5941346Z 2023-01-11T22:10:24.5941436Z OK (skipped=1) 2023-01-11T22:10:24.5941455Z 2023-01-11T22:10:24.5941570Z Generating XML reports... 2023-01-11T22:10:24.5942004Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220843.xml 2023-01-11T22:10:24.5942366Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5942543Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5942913Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5943098Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5943118Z 2023-01-11T22:10:24.5943221Z Running tests... 2023-01-11T22:10:24.5943474Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5943763Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5944012Z test_send_recv_nccl_torch_profiler (__main__.TestDistBackendWithSpawn) ... skip: NCCL Send Recv Only (0.002s) 2023-01-11T22:10:24.5944031Z 2023-01-11T22:10:24.5944285Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5944391Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5944411Z 2023-01-11T22:10:24.5944513Z OK (skipped=1) 2023-01-11T22:10:24.5944536Z 2023-01-11T22:10:24.5944651Z Generating XML reports... 2023-01-11T22:10:24.5945086Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220846.xml 2023-01-11T22:10:24.5945452Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5945609Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5945981Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5946166Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5946186Z 2023-01-11T22:10:24.5946288Z Running tests... 2023-01-11T22:10:24.5946545Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5946851Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5947168Z test_send_recv_torch_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5947378Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48301 2023-01-11T22:10:24.5947590Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48302 2023-01-11T22:10:24.5947945Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5948111Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5948479Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5948661Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5949013Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5949179Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5949548Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5949774Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5950008Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5950242Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5950640Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5951028Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5951255Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5951585Z STAGE:2023-01-11 22:08:52 48302:48302 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5951814Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5952143Z STAGE:2023-01-11 22:08:52 48301:48301 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5952412Z [1673474932.353971] [7c5487d9c02b:48302:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5952639Z [1673474933.971784] [7c5487d9c02b:48302:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5952857Z [1673474933.971784] [7c5487d9c02b:48302:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5953186Z STAGE:2023-01-11 22:08:54 48302:48302 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5953450Z [1673474932.332871] [7c5487d9c02b:48301:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5953680Z [1673474933.986994] [7c5487d9c02b:48301:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5953912Z [1673474933.986994] [7c5487d9c02b:48301:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5954244Z STAGE:2023-01-11 22:08:54 48301:48301 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5954585Z STAGE:2023-01-11 22:08:54 48302:48302 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5954924Z STAGE:2023-01-11 22:08:54 48301:48301 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5955022Z ok (6.738s) 2023-01-11T22:10:24.5955043Z 2023-01-11T22:10:24.5955285Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5955450Z Ran 1 test in 6.738s 2023-01-11T22:10:24.5955470Z 2023-01-11T22:10:24.5955559Z OK 2023-01-11T22:10:24.5955578Z 2023-01-11T22:10:24.5955700Z Generating XML reports... 2023-01-11T22:10:24.5956145Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220848.xml 2023-01-11T22:10:24.5956508Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5956677Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5957042Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5957222Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5957242Z 2023-01-11T22:10:24.5957334Z Running tests... 2023-01-11T22:10:24.5957591Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5957896Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5958191Z test_send_recv_with_tag (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5958407Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48415 2023-01-11T22:10:24.5958614Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48416 2023-01-11T22:10:24.5958971Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5959139Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5959494Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5959674Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5960029Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5960206Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5960580Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5960765Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5961007Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5961246Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5961640Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5962016Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5962240Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5962458Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5962732Z [1673474941.689837] [7c5487d9c02b:48416:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5962958Z [1673474943.120971] [7c5487d9c02b:48416:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5963191Z [1673474943.120971] [7c5487d9c02b:48416:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5963456Z [1673474941.669719] [7c5487d9c02b:48415:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5963678Z [1673474943.090319] [7c5487d9c02b:48415:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5963965Z [1673474943.090319] [7c5487d9c02b:48415:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5964070Z ok (6.262s) 2023-01-11T22:10:24.5964090Z 2023-01-11T22:10:24.5964343Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5964451Z Ran 1 test in 6.263s 2023-01-11T22:10:24.5964470Z 2023-01-11T22:10:24.5964560Z OK 2023-01-11T22:10:24.5964580Z 2023-01-11T22:10:24.5964700Z Generating XML reports... 2023-01-11T22:10:24.5965141Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220857.xml 2023-01-11T22:10:24.5965506Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5965681Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5966054Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5966230Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5966261Z 2023-01-11T22:10:24.5966400Z Running tests... 2023-01-11T22:10:24.5966663Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5966965Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5967245Z test_send_recv_with_tag_autograd_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5967457Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48525 2023-01-11T22:10:24.5967663Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48526 2023-01-11T22:10:24.5968017Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5968185Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5968542Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5968725Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5969084Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5969251Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5969621Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5969803Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5970040Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5970273Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5970652Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5971040Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5971263Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5971589Z STAGE:2023-01-11 22:09:10 48526:48526 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5971812Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5972134Z STAGE:2023-01-11 22:09:10 48525:48525 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5972397Z [1673474950.348775] [7c5487d9c02b:48525:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5972676Z [1673474952.002182] [7c5487d9c02b:48525:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5972907Z [1673474952.002182] [7c5487d9c02b:48525:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5973239Z STAGE:2023-01-11 22:09:12 48525:48525 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5973490Z [1673474950.369674] [7c5487d9c02b:48526:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5973712Z [1673474951.982042] [7c5487d9c02b:48526:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5973940Z [1673474951.982042] [7c5487d9c02b:48526:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5974269Z STAGE:2023-01-11 22:09:12 48526:48526 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5974611Z STAGE:2023-01-11 22:09:12 48525:48525 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5975000Z STAGE:2023-01-11 22:09:12 48526:48526 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5975099Z ok (6.615s) 2023-01-11T22:10:24.5975120Z 2023-01-11T22:10:24.5975378Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5975486Z Ran 1 test in 6.615s 2023-01-11T22:10:24.5975506Z 2023-01-11T22:10:24.5975581Z OK 2023-01-11T22:10:24.5975599Z 2023-01-11T22:10:24.5975713Z Generating XML reports... 2023-01-11T22:10:24.5976149Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220906.xml 2023-01-11T22:10:24.5976508Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5976905Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5977291Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5977486Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5977506Z 2023-01-11T22:10:24.5977612Z Running tests... 2023-01-11T22:10:24.5977856Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5978164Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5978437Z test_send_recv_with_tag_torch_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5978649Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48639 2023-01-11T22:10:24.5978858Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48640 2023-01-11T22:10:24.5979225Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5979397Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5979768Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5979955Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5980303Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5980470Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5980843Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5981025Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5981260Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5981586Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5981984Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5982369Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5982593Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.5982909Z STAGE:2023-01-11 22:09:19 48639:48639 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5983130Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.5983454Z STAGE:2023-01-11 22:09:19 48640:48640 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:10:24.5983721Z [1673474959.563862] [7c5487d9c02b:48639:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5984030Z [1673474961.204542] [7c5487d9c02b:48639:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5984271Z [1673474961.204542] [7c5487d9c02b:48639:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5984606Z STAGE:2023-01-11 22:09:21 48639:48639 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5984865Z [1673474959.583989] [7c5487d9c02b:48640:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.5985084Z [1673474961.213873] [7c5487d9c02b:48640:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.5985307Z [1673474961.213873] [7c5487d9c02b:48640:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.5985630Z STAGE:2023-01-11 22:09:21 48640:48640 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:10:24.5985969Z STAGE:2023-01-11 22:09:21 48639:48639 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5986307Z STAGE:2023-01-11 22:09:21 48640:48640 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:10:24.5986401Z ok (6.653s) 2023-01-11T22:10:24.5986421Z 2023-01-11T22:10:24.5986672Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5986777Z Ran 1 test in 6.653s 2023-01-11T22:10:24.5986797Z 2023-01-11T22:10:24.5986885Z OK 2023-01-11T22:10:24.5986904Z 2023-01-11T22:10:24.5987026Z Generating XML reports... 2023-01-11T22:10:24.5987452Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220915.xml 2023-01-11T22:10:24.5987822Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5987988Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5988361Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5988542Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5988562Z 2023-01-11T22:10:24.5988660Z Running tests... 2023-01-11T22:10:24.5988911Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5989211Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5989478Z test_sparse_all_reduce_sum (__main__.TestDistBackendWithSpawn) ... skip: Only Gloo backend support sparse all reduce (0.002s) 2023-01-11T22:10:24.5989498Z 2023-01-11T22:10:24.5989737Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5989901Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5989921Z 2023-01-11T22:10:24.5990027Z OK (skipped=1) 2023-01-11T22:10:24.5990046Z 2023-01-11T22:10:24.5990168Z Generating XML reports... 2023-01-11T22:10:24.5990600Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220924.xml 2023-01-11T22:10:24.5990957Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5991127Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5991496Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5991681Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5991701Z 2023-01-11T22:10:24.5991791Z Running tests... 2023-01-11T22:10:24.5992045Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5992351Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5992680Z test_sparse_all_reduce_sum_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Gloo backend support sparse all reduce (0.002s) 2023-01-11T22:10:24.5992702Z 2023-01-11T22:10:24.5992963Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5993068Z Ran 1 test in 0.002s 2023-01-11T22:10:24.5993087Z 2023-01-11T22:10:24.5993187Z OK (skipped=1) 2023-01-11T22:10:24.5993206Z 2023-01-11T22:10:24.5993324Z Generating XML reports... 2023-01-11T22:10:24.5993743Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220927.xml 2023-01-11T22:10:24.5994102Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5994270Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5994646Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5994830Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5994848Z 2023-01-11T22:10:24.5994952Z Running tests... 2023-01-11T22:10:24.5995206Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.5995506Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.5995762Z test_stateless_api_with_ddp (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.5995964Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48819 2023-01-11T22:10:24.5996175Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48820 2023-01-11T22:10:24.5996534Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5996704Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5997075Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5997259Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5997616Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.5997788Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.5998158Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.5998328Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.5998568Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.5998892Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.5999289Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5999675Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.5999898Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.6000123Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.6000391Z [1673474974.800109] [7c5487d9c02b:48820:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.6000617Z [1673474974.813529] [7c5487d9c02b:48820:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.6000839Z [1673474974.813529] [7c5487d9c02b:48820:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.6001140Z [1673474974.790384] [7c5487d9c02b:48819:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.6001363Z [1673474974.804113] [7c5487d9c02b:48819:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.6001591Z [1673474974.804113] [7c5487d9c02b:48819:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.6001686Z ok (6.606s) 2023-01-11T22:10:24.6001707Z 2023-01-11T22:10:24.6001964Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.6002069Z Ran 1 test in 6.606s 2023-01-11T22:10:24.6002088Z 2023-01-11T22:10:24.6002176Z OK 2023-01-11T22:10:24.6002195Z 2023-01-11T22:10:24.6002311Z Generating XML reports... 2023-01-11T22:10:24.6002737Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220929.xml 2023-01-11T22:10:24.6003099Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.6003267Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.6003632Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.6003815Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.6003835Z 2023-01-11T22:10:24.6003938Z Running tests... 2023-01-11T22:10:24.6004189Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.6004490Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.6004734Z test_static_graph_api_cpu (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.6004949Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48937 2023-01-11T22:10:24.6005162Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48938 2023-01-11T22:10:24.6005526Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.6005692Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.6006061Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.6006245Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.6006600Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.6006770Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.6007182Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.6007362Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.6007603Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.6007889Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.6008285Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.6008671Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.6008890Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.6009106Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.6009378Z [1673474982.570262] [7c5487d9c02b:48937:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.6009637Z [1673474984.018592] [7c5487d9c02b:48937:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.6009877Z [1673474984.018592] [7c5487d9c02b:48937:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.6010138Z [1673474982.590953] [7c5487d9c02b:48938:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.6010357Z [1673474983.990172] [7c5487d9c02b:48938:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.6010583Z [1673474983.990172] [7c5487d9c02b:48938:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.6010680Z ok (6.106s) 2023-01-11T22:10:24.6010700Z 2023-01-11T22:10:24.6010962Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.6011064Z Ran 1 test in 6.106s 2023-01-11T22:10:24.6011087Z 2023-01-11T22:10:24.6011171Z OK 2023-01-11T22:10:24.6011191Z 2023-01-11T22:10:24.6011297Z Generating XML reports... 2023-01-11T22:10:24.6011730Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220938.xml 2023-01-11T22:10:24.6012089Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.6012254Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.6012625Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.6012812Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.6012835Z 2023-01-11T22:10:24.6012940Z Running tests... 2023-01-11T22:10:24.6013198Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.6013505Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.6013787Z test_sync_bn_logged (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl & Gloo backend support DistributedDataParallel (0.002s) 2023-01-11T22:10:24.6013822Z 2023-01-11T22:10:24.6014063Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.6014165Z Ran 1 test in 0.002s 2023-01-11T22:10:24.6014185Z 2023-01-11T22:10:24.6014284Z OK (skipped=1) 2023-01-11T22:10:24.6014304Z 2023-01-11T22:10:24.6014419Z Generating XML reports... 2023-01-11T22:10:24.6014851Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220947.xml 2023-01-11T22:10:24.6015216Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.6015433Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.6015811Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.6015984Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.6016005Z 2023-01-11T22:10:24.6016108Z Running tests... 2023-01-11T22:10:24.6016358Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.6017073Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.6017370Z test_undefined_grad_parity_unused_parameters (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.6017582Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49084 2023-01-11T22:10:24.6017795Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49085 2023-01-11T22:10:24.6018175Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.6018404Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.6018794Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.6018973Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.6019330Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.6019494Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.6019861Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.6020043Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.6020284Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.6020523Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.6020905Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.6021292Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.6021514Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.6021732Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.6022002Z [1673474995.034606] [7c5487d9c02b:49085:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.6022229Z [1673474995.047934] [7c5487d9c02b:49085:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.6022461Z [1673474995.047934] [7c5487d9c02b:49085:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.6023228Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:10:24.6023495Z [1673474995.029821] [7c5487d9c02b:49084:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.6023792Z [1673474995.043414] [7c5487d9c02b:49084:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.6024019Z [1673474995.043414] [7c5487d9c02b:49084:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.6024780Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:10:24.6024878Z ok (6.624s) 2023-01-11T22:10:24.6024898Z 2023-01-11T22:10:24.6025158Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.6025258Z Ran 1 test in 6.624s 2023-01-11T22:10:24.6025278Z 2023-01-11T22:10:24.6025359Z OK 2023-01-11T22:10:24.6025379Z 2023-01-11T22:10:24.6025494Z Generating XML reports... 2023-01-11T22:10:24.6025978Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220949.xml 2023-01-11T22:10:24.6026352Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.6026524Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.6026894Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.6027076Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.6027096Z 2023-01-11T22:10:24.6027187Z Running tests... 2023-01-11T22:10:24.6027440Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.6027751Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.6028030Z test_verify_model_across_rank_with_logger (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.6028244Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49202 2023-01-11T22:10:24.6028455Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49203 2023-01-11T22:10:24.6028819Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.6028987Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.6029356Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.6029527Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.6029889Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.6030056Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.6030424Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.6030604Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.6030840Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.6031078Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.6031468Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.6031840Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.6032118Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.6032340Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.6032574Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:10:24.6032810Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:10:24.6033192Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.6033570Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.6033803Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:10:24.6034039Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:10:24.6034407Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:10:24.6034836Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:10:24.6035113Z [1673475004.286017] [7c5487d9c02b:49203:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.6035341Z [1673475004.299247] [7c5487d9c02b:49203:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.6035573Z [1673475004.299247] [7c5487d9c02b:49203:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.6035946Z [1673475009.648414] [7c5487d9c02b:49203:0] tag_match.c:62 UCX WARN unexpected tag-receive descriptor 0x366b4e00 was not matched 2023-01-11T22:10:24.6036210Z [1673475004.278595] [7c5487d9c02b:49202:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.6036434Z [1673475004.292255] [7c5487d9c02b:49202:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.6036665Z [1673475004.292255] [7c5487d9c02b:49202:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.6036971Z [1673475009.617306] [7c5487d9c02b:49202:1] ucc_schedule.h:189 UCC WARN timeout 5 sec. has expired on req 0x35a049c0, seq_num 5, TL_UCP, team_id 1, size 2, rank 0, ctx_rank 0: Barrier n/a inplace=0 bytes=0 2023-01-11T22:10:24.6037231Z [1673475009.658517] [7c5487d9c02b:49202:0] mpool.c:55 UCX WARN object 0x35b40200 {flags:0x20040 recv length 0 host memory} was not returned to mpool ucp_requests 2023-01-11T22:10:24.6037316Z ok (11.208s) 2023-01-11T22:10:24.6037350Z 2023-01-11T22:10:24.6037599Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.6037708Z Ran 1 test in 11.208s 2023-01-11T22:10:24.6037727Z 2023-01-11T22:10:24.6037815Z OK 2023-01-11T22:10:24.6037838Z 2023-01-11T22:10:24.6037955Z Generating XML reports... 2023-01-11T22:10:24.6038390Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220958.xml 2023-01-11T22:10:24.6038751Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.6038921Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.6039294Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.6039466Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.6039486Z 2023-01-11T22:10:24.6039591Z Running tests... 2023-01-11T22:10:24.6039907Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.6040212Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:10:24.6040494Z test_verify_model_across_rank_without_logger (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:10:24.6040705Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49322 2023-01-11T22:10:24.6040913Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49323 2023-01-11T22:10:24.6041271Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.6041429Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.6041795Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.6041980Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.6042340Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:10:24.6042567Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:10:24.6042944Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:10:24.6043123Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:10:24.6043356Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:10:24.6043593Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:10:24.6043970Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.6044352Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:10:24.6044576Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:10:24.6044801Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:10:24.6045032Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:10:24.6045264Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:10:24.6045648Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.6046022Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:10:24.6046257Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:10:24.6046482Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:10:24.6046862Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:10:24.6047238Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:10:24.6047500Z [1673475017.866322] [7c5487d9c02b:49323:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.6047724Z [1673475017.879347] [7c5487d9c02b:49323:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.6047959Z [1673475017.879347] [7c5487d9c02b:49323:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.6048327Z [1673475023.233705] [7c5487d9c02b:49323:0] tag_match.c:62 UCX WARN unexpected tag-receive descriptor 0x3844f980 was not matched 2023-01-11T22:10:24.6048650Z [1673475017.859710] [7c5487d9c02b:49322:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:10:24.6048867Z [1673475017.873115] [7c5487d9c02b:49322:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:10:24.6049088Z [1673475017.873115] [7c5487d9c02b:49322:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:10:24.6049391Z [1673475023.202630] [7c5487d9c02b:49322:1] ucc_schedule.h:189 UCC WARN timeout 5 sec. has expired on req 0x3648ab00, seq_num 5, TL_UCP, team_id 1, size 2, rank 0, ctx_rank 0: Barrier n/a inplace=0 bytes=0 2023-01-11T22:10:24.6049642Z [1673475023.233601] [7c5487d9c02b:49322:0] mpool.c:55 UCX WARN object 0x3659c000 {flags:0x20040 recv length 0 host memory} was not returned to mpool ucp_requests 2023-01-11T22:10:24.6049739Z ok (11.004s) 2023-01-11T22:10:24.6049760Z 2023-01-11T22:10:24.6050017Z ---------------------------------------------------------------------- 2023-01-11T22:10:24.6050122Z Ran 1 test in 11.005s 2023-01-11T22:10:24.6050186Z 2023-01-11T22:10:24.6050276Z OK 2023-01-11T22:10:24.6050295Z 2023-01-11T22:10:24.6050415Z Generating XML reports... 2023-01-11T22:10:24.6050856Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221012.xml 2023-01-11T22:10:24.6050875Z 2023-01-11T22:10:24.6051278Z ##[endgroup] 2023-01-11T22:10:24.6051738Z FINISHED PRINTING LOG FILE of distributed/test_distributed_spawn (/var/lib/jenkins/workspace/test/test-reports/distributed-test_distributed_spawn_q8dvsh36) 2023-01-11T22:10:24.6051758Z 2023-01-11T22:10:24.6052022Z Running distributed/pipeline/sync/test_worker ... [2023-01-11 22:10:24.381895] 2023-01-11T22:10:24.6052403Z Executing ['/opt/conda/bin/python', '-bb', '-m', 'pytest', 'distributed/pipeline/sync/test_worker.py', '-v'] ... [2023-01-11 22:10:24.382227] 2023-01-11T22:10:27.3004469Z 2023-01-11T22:10:27.3004964Z Expand the folded group to see the log file of distributed/pipeline/sync/test_worker 2023-01-11T22:10:27.3005942Z ##[group]PRINTING LOG FILE of distributed/pipeline/sync/test_worker (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-test_worker_8xp3kaz4) 2023-01-11T22:10:27.3006447Z ============================= test session starts ============================== 2023-01-11T22:10:27.3007051Z platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0 -- /opt/conda/bin/python 2023-01-11T22:10:27.3007464Z cachedir: .pytest_cache 2023-01-11T22:10:27.3008031Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2023-01-11T22:10:27.3008448Z torch: 2.0.0a0+git8419ddd 2023-01-11T22:10:27.3008791Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2023-01-11T22:10:27.3009365Z plugins: hypothesis-5.35.1, flakefinder-1.1.0, rerunfailures-10.3, shard-0.1.2, xdist-3.1.0, xdoctest-1.1.0 2023-01-11T22:10:27.3009742Z collecting ... collected 6 items 2023-01-11T22:10:27.3010599Z Running 6 items in this shard: test/distributed/pipeline/sync/test_worker.py::test_compute_multithreading, test/distributed/pipeline/sync/test_worker.py::test_compute_success, test/distributed/pipeline/sync/test_worker.py::test_compute_exception, test/distributed/pipeline/sync/test_worker.py::test_grad_mode[True], test/distributed/pipeline/sync/test_worker.py::test_grad_mode[False], test/distributed/pipeline/sync/test_worker.py::test_worker_per_device 2023-01-11T22:10:27.3011293Z 2023-01-11T22:10:27.3011524Z distributed/pipeline/sync/test_worker.py::test_compute_multithreading PASSED [ 16%] 2023-01-11T22:10:27.3011970Z distributed/pipeline/sync/test_worker.py::test_compute_success PASSED [ 33%] 2023-01-11T22:10:27.3012403Z distributed/pipeline/sync/test_worker.py::test_compute_exception PASSED [ 50%] 2023-01-11T22:10:27.3013055Z distributed/pipeline/sync/test_worker.py::test_grad_mode[True] PASSED [ 66%] 2023-01-11T22:10:27.3013470Z distributed/pipeline/sync/test_worker.py::test_grad_mode[False] PASSED [ 83%] 2023-01-11T22:10:27.3013899Z distributed/pipeline/sync/test_worker.py::test_worker_per_device PASSED [100%] 2023-01-11T22:10:27.3014146Z 2023-01-11T22:10:27.3014307Z ============================== 6 passed in 0.07s =============================== 2023-01-11T22:10:27.3014482Z 2023-01-11T22:10:27.3014790Z ##[endgroup] 2023-01-11T22:10:27.3015432Z FINISHED PRINTING LOG FILE of distributed/pipeline/sync/test_worker (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-test_worker_8xp3kaz4) 2023-01-11T22:10:27.3015804Z 2023-01-11T22:10:27.3016089Z Running distributed/pipeline/sync/test_pipeline ... [2023-01-11 22:10:27.300489] 2023-01-11T22:10:27.3017350Z Executing ['/opt/conda/bin/python', '-bb', '-m', 'pytest', 'distributed/pipeline/sync/test_pipeline.py', '-v'] ... [2023-01-11 22:10:27.300732] 2023-01-11T22:10:29.6715596Z 2023-01-11T22:10:29.6716353Z Expand the folded group to see the log file of distributed/pipeline/sync/test_pipeline 2023-01-11T22:10:29.6718338Z ##[group]PRINTING LOG FILE of distributed/pipeline/sync/test_pipeline (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-test_pipeline_zlrjn8aw) 2023-01-11T22:10:29.6718942Z ============================= test session starts ============================== 2023-01-11T22:10:29.6719532Z platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0 -- /opt/conda/bin/python 2023-01-11T22:10:29.6719885Z cachedir: .pytest_cache 2023-01-11T22:10:29.6720431Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2023-01-11T22:10:29.6720860Z torch: 2.0.0a0+git8419ddd 2023-01-11T22:10:29.6721183Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2023-01-11T22:10:29.6721746Z plugins: hypothesis-5.35.1, flakefinder-1.1.0, rerunfailures-10.3, shard-0.1.2, xdist-3.1.0, xdoctest-1.1.0 2023-01-11T22:10:29.6722136Z collecting ... collected 1 item 2023-01-11T22:10:29.6722541Z Running 1 items in this shard: test/distributed/pipeline/sync/test_pipeline.py::test_clock_cycles 2023-01-11T22:10:29.6722815Z 2023-01-11T22:10:29.6723031Z distributed/pipeline/sync/test_pipeline.py::test_clock_cycles PASSED [100%] 2023-01-11T22:10:29.6723279Z 2023-01-11T22:10:29.6723419Z ============================== 1 passed in 0.03s =============================== 2023-01-11T22:10:29.6723610Z 2023-01-11T22:10:29.6723924Z ##[endgroup] 2023-01-11T22:10:29.6724561Z FINISHED PRINTING LOG FILE of distributed/pipeline/sync/test_pipeline (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-test_pipeline_zlrjn8aw) 2023-01-11T22:10:29.6724945Z 2023-01-11T22:10:29.6725234Z Running distributed/pipeline/sync/test_microbatch ... [2023-01-11 22:10:29.671562] 2023-01-11T22:10:29.6725843Z Executing ['/opt/conda/bin/python', '-bb', '-m', 'pytest', 'distributed/pipeline/sync/test_microbatch.py', '-v'] ... [2023-01-11 22:10:29.671834] 2023-01-11T22:10:32.0629997Z 2023-01-11T22:10:32.0631016Z Expand the folded group to see the log file of distributed/pipeline/sync/test_microbatch 2023-01-11T22:10:32.0632551Z ##[group]PRINTING LOG FILE of distributed/pipeline/sync/test_microbatch (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-test_microbatch_9tnr4ziz) 2023-01-11T22:10:32.0633097Z ============================= test session starts ============================== 2023-01-11T22:10:32.0633859Z platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0 -- /opt/conda/bin/python 2023-01-11T22:10:32.0634297Z cachedir: .pytest_cache 2023-01-11T22:10:32.0634879Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2023-01-11T22:10:32.0635314Z torch: 2.0.0a0+git8419ddd 2023-01-11T22:10:32.0635640Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2023-01-11T22:10:32.0636524Z plugins: hypothesis-5.35.1, flakefinder-1.1.0, rerunfailures-10.3, shard-0.1.2, xdist-3.1.0, xdoctest-1.1.0 2023-01-11T22:10:32.0637198Z collecting ... collected 10 items 2023-01-11T22:10:32.0638462Z Running 10 items in this shard: test/distributed/pipeline/sync/test_microbatch.py::test_batch_atomic, test/distributed/pipeline/sync/test_microbatch.py::test_batch_non_atomic, test/distributed/pipeline/sync/test_microbatch.py::test_batch_call, test/distributed/pipeline/sync/test_microbatch.py::test_batch_setitem_by_index, test/distributed/pipeline/sync/test_microbatch.py::test_batch_setitem_by_slice, test/distributed/pipeline/sync/test_microbatch.py::test_check, test/distributed/pipeline/sync/test_microbatch.py::test_gather_tensors, test/distributed/pipeline/sync/test_microbatch.py::test_gather_tuples, test/distributed/pipeline/sync/test_microbatch.py::test_scatter_tensor, test/distributed/pipeline/sync/test_microbatch.py::test_scatter_multiple_tensors 2023-01-11T22:10:32.0639539Z 2023-01-11T22:10:32.0639765Z distributed/pipeline/sync/test_microbatch.py::test_batch_atomic PASSED [ 10%] 2023-01-11T22:10:32.0640213Z distributed/pipeline/sync/test_microbatch.py::test_batch_non_atomic PASSED [ 20%] 2023-01-11T22:10:32.0640761Z distributed/pipeline/sync/test_microbatch.py::test_batch_call PASSED [ 30%] 2023-01-11T22:10:32.0641205Z distributed/pipeline/sync/test_microbatch.py::test_batch_setitem_by_index PASSED [ 40%] 2023-01-11T22:10:32.0641667Z distributed/pipeline/sync/test_microbatch.py::test_batch_setitem_by_slice PASSED [ 50%] 2023-01-11T22:10:32.0642109Z distributed/pipeline/sync/test_microbatch.py::test_check PASSED [ 60%] 2023-01-11T22:10:32.0642521Z distributed/pipeline/sync/test_microbatch.py::test_gather_tensors PASSED [ 70%] 2023-01-11T22:10:32.0642952Z distributed/pipeline/sync/test_microbatch.py::test_gather_tuples PASSED [ 80%] 2023-01-11T22:10:32.0643380Z distributed/pipeline/sync/test_microbatch.py::test_scatter_tensor PASSED [ 90%] 2023-01-11T22:10:32.0643831Z distributed/pipeline/sync/test_microbatch.py::test_scatter_multiple_tensors PASSED [100%] 2023-01-11T22:10:32.0644099Z 2023-01-11T22:10:32.0644239Z ============================== 10 passed in 0.08s ============================== 2023-01-11T22:10:32.0644434Z 2023-01-11T22:10:32.0644755Z ##[endgroup] 2023-01-11T22:10:32.0645422Z FINISHED PRINTING LOG FILE of distributed/pipeline/sync/test_microbatch (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-test_microbatch_9tnr4ziz) 2023-01-11T22:10:32.0645810Z 2023-01-11T22:10:32.0646095Z Running distributed/pipeline/sync/test_deferred_batch_norm ... [2023-01-11 22:10:32.063112] 2023-01-11T22:10:32.0646752Z Executing ['/opt/conda/bin/python', '-bb', '-m', 'pytest', 'distributed/pipeline/sync/test_deferred_batch_norm.py', '-v'] ... [2023-01-11 22:10:32.063445] 2023-01-11T22:10:34.9796519Z 2023-01-11T22:10:34.9797565Z Expand the folded group to see the log file of distributed/pipeline/sync/test_deferred_batch_norm 2023-01-11T22:10:34.9798996Z ##[group]PRINTING LOG FILE of distributed/pipeline/sync/test_deferred_batch_norm (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-test_deferred_batch_norm_a8diqwcq) 2023-01-11T22:10:34.9799885Z ============================= test session starts ============================== 2023-01-11T22:10:34.9800488Z platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0 -- /opt/conda/bin/python 2023-01-11T22:10:34.9800869Z cachedir: .pytest_cache 2023-01-11T22:10:34.9801436Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2023-01-11T22:10:34.9801846Z torch: 2.0.0a0+git8419ddd 2023-01-11T22:10:34.9802172Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2023-01-11T22:10:34.9803048Z plugins: hypothesis-5.35.1, flakefinder-1.1.0, rerunfailures-10.3, shard-0.1.2, xdist-3.1.0, xdoctest-1.1.0 2023-01-11T22:10:34.9803479Z collecting ... collected 11 items 2023-01-11T22:10:34.9805610Z Running 11 items in this shard: test/distributed/pipeline/sync/test_deferred_batch_norm.py::test_transparency[True-1], test/distributed/pipeline/sync/test_deferred_batch_norm.py::test_transparency[True-4], test/distributed/pipeline/sync/test_deferred_batch_norm.py::test_transparency[False-1], test/distributed/pipeline/sync/test_deferred_batch_norm.py::test_transparency[False-4], test/distributed/pipeline/sync/test_deferred_batch_norm.py::test_running_stats[0.1], test/distributed/pipeline/sync/test_deferred_batch_norm.py::test_running_stats[None], test/distributed/pipeline/sync/test_deferred_batch_norm.py::test_convert_deferred_batch_norm, test/distributed/pipeline/sync/test_deferred_batch_norm.py::test_eval, test/distributed/pipeline/sync/test_deferred_batch_norm.py::test_optimize, test/distributed/pipeline/sync/test_deferred_batch_norm.py::test_conv_bn, test/distributed/pipeline/sync/test_deferred_batch_norm.py::test_input_requiring_grad 2023-01-11T22:10:34.9807145Z 2023-01-11T22:10:34.9807545Z distributed/pipeline/sync/test_deferred_batch_norm.py::test_transparency[True-1] PASSED [ 9%] 2023-01-11T22:10:34.9808108Z distributed/pipeline/sync/test_deferred_batch_norm.py::test_transparency[True-4] PASSED [ 18%] 2023-01-11T22:10:34.9808767Z distributed/pipeline/sync/test_deferred_batch_norm.py::test_transparency[False-1] PASSED [ 27%] 2023-01-11T22:10:34.9809362Z distributed/pipeline/sync/test_deferred_batch_norm.py::test_transparency[False-4] PASSED [ 36%] 2023-01-11T22:10:34.9809818Z distributed/pipeline/sync/test_deferred_batch_norm.py::test_running_stats[0.1] PASSED [ 45%] 2023-01-11T22:10:34.9810284Z distributed/pipeline/sync/test_deferred_batch_norm.py::test_running_stats[None] PASSED [ 54%] 2023-01-11T22:10:34.9810766Z distributed/pipeline/sync/test_deferred_batch_norm.py::test_convert_deferred_batch_norm PASSED [ 63%] 2023-01-11T22:10:34.9811231Z distributed/pipeline/sync/test_deferred_batch_norm.py::test_eval PASSED [ 72%] 2023-01-11T22:10:34.9811656Z distributed/pipeline/sync/test_deferred_batch_norm.py::test_optimize PASSED [ 81%] 2023-01-11T22:10:34.9812106Z distributed/pipeline/sync/test_deferred_batch_norm.py::test_conv_bn PASSED [ 90%] 2023-01-11T22:10:34.9812564Z distributed/pipeline/sync/test_deferred_batch_norm.py::test_input_requiring_grad PASSED [100%] 2023-01-11T22:10:34.9812829Z 2023-01-11T22:10:34.9812989Z ============================== 11 passed in 0.61s ============================== 2023-01-11T22:10:34.9813163Z 2023-01-11T22:10:34.9815152Z ##[endgroup] 2023-01-11T22:10:34.9815877Z FINISHED PRINTING LOG FILE of distributed/pipeline/sync/test_deferred_batch_norm (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-test_deferred_batch_norm_a8diqwcq) 2023-01-11T22:10:34.9816280Z 2023-01-11T22:10:34.9817056Z Running distributed/pipeline/sync/test_bugs ... [2023-01-11 22:10:34.979733] 2023-01-11T22:10:34.9817746Z Executing ['/opt/conda/bin/python', '-bb', '-m', 'pytest', 'distributed/pipeline/sync/test_bugs.py', '-v'] ... [2023-01-11 22:10:34.980070] 2023-01-11T22:10:42.0747523Z 2023-01-11T22:10:42.0748410Z Expand the folded group to see the log file of distributed/pipeline/sync/test_bugs 2023-01-11T22:10:42.0750479Z ##[group]PRINTING LOG FILE of distributed/pipeline/sync/test_bugs (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-test_bugs_va704cfq) 2023-01-11T22:10:42.0751238Z ============================= test session starts ============================== 2023-01-11T22:10:42.0751821Z platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0 -- /opt/conda/bin/python 2023-01-11T22:10:42.0752198Z cachedir: .pytest_cache 2023-01-11T22:10:42.0752764Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2023-01-11T22:10:42.0753194Z torch: 2.0.0a0+git8419ddd 2023-01-11T22:10:42.0753502Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2023-01-11T22:10:42.0754070Z plugins: hypothesis-5.35.1, flakefinder-1.1.0, rerunfailures-10.3, shard-0.1.2, xdist-3.1.0, xdoctest-1.1.0 2023-01-11T22:10:42.0754469Z collecting ... collected 4 items 2023-01-11T22:10:42.0755379Z Running 4 items in this shard: test/distributed/pipeline/sync/test_bugs.py::test_python_autograd_function, test/distributed/pipeline/sync/test_bugs.py::test_exception_no_hang, test/distributed/pipeline/sync/test_bugs.py::test_tuple_wait, test/distributed/pipeline/sync/test_bugs.py::test_parallel_randoms 2023-01-11T22:10:42.0755916Z 2023-01-11T22:10:42.0756122Z distributed/pipeline/sync/test_bugs.py::test_python_autograd_function PASSED [ 25%] 2023-01-11T22:10:42.0756570Z distributed/pipeline/sync/test_bugs.py::test_exception_no_hang PASSED [ 50%] 2023-01-11T22:10:42.0756994Z distributed/pipeline/sync/test_bugs.py::test_tuple_wait PASSED [ 75%] 2023-01-11T22:10:42.0757414Z distributed/pipeline/sync/test_bugs.py::test_parallel_randoms PASSED [100%] 2023-01-11T22:10:42.0757639Z 2023-01-11T22:10:42.0757798Z ============================== 4 passed in 4.63s =============================== 2023-01-11T22:10:42.0757992Z 2023-01-11T22:10:42.0758302Z ##[endgroup] 2023-01-11T22:10:42.0758934Z FINISHED PRINTING LOG FILE of distributed/pipeline/sync/test_bugs (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-test_bugs_va704cfq) 2023-01-11T22:10:42.0759286Z 2023-01-11T22:10:42.0759679Z Running distributed/pipeline/sync/skip/test_tracker ... [2023-01-11 22:10:42.074781] 2023-01-11T22:10:42.0760322Z Executing ['/opt/conda/bin/python', '-bb', '-m', 'pytest', 'distributed/pipeline/sync/skip/test_tracker.py', '-v'] ... [2023-01-11 22:10:42.075127] 2023-01-11T22:10:45.9172708Z 2023-01-11T22:10:45.9173217Z Expand the folded group to see the log file of distributed/pipeline/sync/skip/test_tracker 2023-01-11T22:10:45.9174224Z ##[group]PRINTING LOG FILE of distributed/pipeline/sync/skip/test_tracker (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-skip-test_tracker_hr1yddfa) 2023-01-11T22:10:45.9174782Z ============================= test session starts ============================== 2023-01-11T22:10:45.9175395Z platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0 -- /opt/conda/bin/python 2023-01-11T22:10:45.9175762Z cachedir: .pytest_cache 2023-01-11T22:10:45.9176318Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2023-01-11T22:10:45.9177052Z torch: 2.0.0a0+git8419ddd 2023-01-11T22:10:45.9177382Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2023-01-11T22:10:45.9177959Z plugins: hypothesis-5.35.1, flakefinder-1.1.0, rerunfailures-10.3, shard-0.1.2, xdist-3.1.0, xdoctest-1.1.0 2023-01-11T22:10:45.9178355Z collecting ... collected 6 items 2023-01-11T22:10:45.9179318Z Running 6 items in this shard: test/distributed/pipeline/sync/skip/test_tracker.py::test_default_skip_tracker, test/distributed/pipeline/sync/skip/test_tracker.py::test_default_skip_tracker_by_data_parallel, test/distributed/pipeline/sync/skip/test_tracker.py::test_reuse_portal, test/distributed/pipeline/sync/skip/test_tracker.py::test_no_copy_no_portal, test/distributed/pipeline/sync/skip/test_tracker.py::test_tensor_life_without_checkpointing, test/distributed/pipeline/sync/skip/test_tracker.py::test_tensor_life_with_checkpointing 2023-01-11T22:10:45.9180127Z 2023-01-11T22:10:45.9180361Z distributed/pipeline/sync/skip/test_tracker.py::test_default_skip_tracker PASSED [ 16%] 2023-01-11T22:10:45.9180850Z distributed/pipeline/sync/skip/test_tracker.py::test_default_skip_tracker_by_data_parallel PASSED [ 33%] 2023-01-11T22:10:45.9181320Z distributed/pipeline/sync/skip/test_tracker.py::test_reuse_portal PASSED [ 50%] 2023-01-11T22:10:45.9181752Z distributed/pipeline/sync/skip/test_tracker.py::test_no_copy_no_portal PASSED [ 66%] 2023-01-11T22:10:45.9182231Z distributed/pipeline/sync/skip/test_tracker.py::test_tensor_life_without_checkpointing PASSED [ 83%] 2023-01-11T22:10:45.9182730Z distributed/pipeline/sync/skip/test_tracker.py::test_tensor_life_with_checkpointing PASSED [100%] 2023-01-11T22:10:45.9183001Z 2023-01-11T22:10:45.9183158Z ============================== 6 passed in 1.39s =============================== 2023-01-11T22:10:45.9183604Z 2023-01-11T22:10:45.9183918Z ##[endgroup] 2023-01-11T22:10:45.9184595Z FINISHED PRINTING LOG FILE of distributed/pipeline/sync/skip/test_tracker (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-skip-test_tracker_hr1yddfa) 2023-01-11T22:10:45.9184988Z 2023-01-11T22:10:45.9185272Z Running distributed/pipeline/sync/skip/test_leak ... [2023-01-11 22:10:45.917289] 2023-01-11T22:10:45.9185870Z Executing ['/opt/conda/bin/python', '-bb', '-m', 'pytest', 'distributed/pipeline/sync/skip/test_leak.py', '-v'] ... [2023-01-11 22:10:45.917699] 2023-01-11T22:10:48.5431108Z 2023-01-11T22:10:48.5431902Z Expand the folded group to see the log file of distributed/pipeline/sync/skip/test_leak 2023-01-11T22:10:48.5433188Z ##[group]PRINTING LOG FILE of distributed/pipeline/sync/skip/test_leak (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-skip-test_leak__h1klqcm) 2023-01-11T22:10:48.5433732Z ============================= test session starts ============================== 2023-01-11T22:10:48.5434340Z platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0 -- /opt/conda/bin/python 2023-01-11T22:10:48.5434693Z cachedir: .pytest_cache 2023-01-11T22:10:48.5435485Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2023-01-11T22:10:48.5435930Z torch: 2.0.0a0+git8419ddd 2023-01-11T22:10:48.5436261Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2023-01-11T22:10:48.5436832Z plugins: hypothesis-5.35.1, flakefinder-1.1.0, rerunfailures-10.3, shard-0.1.2, xdist-3.1.0, xdoctest-1.1.0 2023-01-11T22:10:48.5437215Z collecting ... collected 8 items 2023-01-11T22:10:48.5438971Z Running 8 items in this shard: test/distributed/pipeline/sync/skip/test_leak.py::test_delete_portal_tensor[always-train], test/distributed/pipeline/sync/skip/test_leak.py::test_delete_portal_tensor[always-eval], test/distributed/pipeline/sync/skip/test_leak.py::test_delete_portal_tensor[except_last-train], test/distributed/pipeline/sync/skip/test_leak.py::test_delete_portal_tensor[except_last-eval], test/distributed/pipeline/sync/skip/test_leak.py::test_delete_portal_tensor[never-train], test/distributed/pipeline/sync/skip/test_leak.py::test_delete_portal_tensor[never-eval], test/distributed/pipeline/sync/skip/test_leak.py::test_no_portal_without_pipe[train], test/distributed/pipeline/sync/skip/test_leak.py::test_no_portal_without_pipe[eval] 2023-01-11T22:10:48.5440047Z 2023-01-11T22:10:48.5440390Z distributed/pipeline/sync/skip/test_leak.py::test_delete_portal_tensor[always-train] PASSED [ 12%] 2023-01-11T22:10:48.5440955Z distributed/pipeline/sync/skip/test_leak.py::test_delete_portal_tensor[always-eval] PASSED [ 25%] 2023-01-11T22:10:48.5441552Z distributed/pipeline/sync/skip/test_leak.py::test_delete_portal_tensor[except_last-train] PASSED [ 37%] 2023-01-11T22:10:48.5442149Z distributed/pipeline/sync/skip/test_leak.py::test_delete_portal_tensor[except_last-eval] PASSED [ 50%] 2023-01-11T22:10:48.5442740Z distributed/pipeline/sync/skip/test_leak.py::test_delete_portal_tensor[never-train] PASSED [ 62%] 2023-01-11T22:10:48.5443320Z distributed/pipeline/sync/skip/test_leak.py::test_delete_portal_tensor[never-eval] PASSED [ 75%] 2023-01-11T22:10:48.5443801Z distributed/pipeline/sync/skip/test_leak.py::test_no_portal_without_pipe[train] PASSED [ 87%] 2023-01-11T22:10:48.5444272Z distributed/pipeline/sync/skip/test_leak.py::test_no_portal_without_pipe[eval] PASSED [100%] 2023-01-11T22:10:48.5444533Z 2023-01-11T22:10:48.5444691Z ============================== 8 passed in 0.28s =============================== 2023-01-11T22:10:48.5444865Z 2023-01-11T22:10:48.5445177Z ##[endgroup] 2023-01-11T22:10:48.5445819Z FINISHED PRINTING LOG FILE of distributed/pipeline/sync/skip/test_leak (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-skip-test_leak__h1klqcm) 2023-01-11T22:10:48.5446203Z 2023-01-11T22:10:48.5446482Z Running distributed/pipeline/sync/skip/test_api ... [2023-01-11 22:10:48.543189] 2023-01-11T22:10:48.5447182Z Executing ['/opt/conda/bin/python', '-bb', '-m', 'pytest', 'distributed/pipeline/sync/skip/test_api.py', '-v'] ... [2023-01-11 22:10:48.543429] 2023-01-11T22:10:50.9417842Z 2023-01-11T22:10:50.9418526Z Expand the folded group to see the log file of distributed/pipeline/sync/skip/test_api 2023-01-11T22:10:50.9419491Z ##[group]PRINTING LOG FILE of distributed/pipeline/sync/skip/test_api (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-skip-test_api_abhl4drg) 2023-01-11T22:10:50.9420054Z ============================= test session starts ============================== 2023-01-11T22:10:50.9420654Z platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0 -- /opt/conda/bin/python 2023-01-11T22:10:50.9420995Z cachedir: .pytest_cache 2023-01-11T22:10:50.9421564Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2023-01-11T22:10:50.9422005Z torch: 2.0.0a0+git8419ddd 2023-01-11T22:10:50.9422341Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2023-01-11T22:10:50.9422882Z plugins: hypothesis-5.35.1, flakefinder-1.1.0, rerunfailures-10.3, shard-0.1.2, xdist-3.1.0, xdoctest-1.1.0 2023-01-11T22:10:50.9423971Z collecting ... collected 3 items 2023-01-11T22:10:50.9425093Z Running 3 items in this shard: test/distributed/pipeline/sync/skip/test_api.py::test_namespace_difference, test/distributed/pipeline/sync/skip/test_api.py::test_namespace_copy, test/distributed/pipeline/sync/skip/test_api.py::test_skippable_repr 2023-01-11T22:10:50.9425915Z 2023-01-11T22:10:50.9426329Z distributed/pipeline/sync/skip/test_api.py::test_namespace_difference PASSED [ 33%] 2023-01-11T22:10:50.9427163Z distributed/pipeline/sync/skip/test_api.py::test_namespace_copy PASSED [ 66%] 2023-01-11T22:10:50.9428031Z distributed/pipeline/sync/skip/test_api.py::test_skippable_repr PASSED [100%] 2023-01-11T22:10:50.9428545Z 2023-01-11T22:10:50.9428847Z ============================== 3 passed in 0.05s =============================== 2023-01-11T22:10:50.9429242Z 2023-01-11T22:10:50.9429800Z ##[endgroup] 2023-01-11T22:10:50.9431173Z FINISHED PRINTING LOG FILE of distributed/pipeline/sync/skip/test_api (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-skip-test_api_abhl4drg) 2023-01-11T22:10:50.9431969Z 2023-01-11T22:10:50.9432505Z Running distributed/fsdp/test_shard_utils ... [2023-01-11 22:10:50.941885] 2023-01-11T22:10:50.9433797Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_shard_utils.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:10:50.942133] 2023-01-11T22:10:53.1172072Z 2023-01-11T22:10:53.1172768Z Expand the folded group to see the log file of distributed/fsdp/test_shard_utils 2023-01-11T22:10:53.1173819Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_shard_utils (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_shard_utils_6a6g10z_) 2023-01-11T22:10:53.1174194Z 2023-01-11T22:10:53.1174491Z ##[endgroup] 2023-01-11T22:10:53.1175212Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_shard_utils (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_shard_utils_6a6g10z_) 2023-01-11T22:10:53.1175565Z 2023-01-11T22:10:53.1175894Z Running distributed/_shard/sharded_tensor/ops/test_math_ops ... [2023-01-11 22:10:53.117263] 2023-01-11T22:10:53.1177949Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_shard/sharded_tensor/ops/test_math_ops.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:10:53.117503] 2023-01-11T22:10:55.2237396Z 2023-01-11T22:10:55.2238125Z Expand the folded group to see the log file of distributed/_shard/sharded_tensor/ops/test_math_ops 2023-01-11T22:10:55.2239491Z ##[group]PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_math_ops (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_math_ops_ndqw4yl8) 2023-01-11T22:10:55.2239918Z 2023-01-11T22:10:55.2240236Z ##[endgroup] 2023-01-11T22:10:55.2241645Z FINISHED PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_math_ops (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_math_ops_ndqw4yl8) 2023-01-11T22:10:55.2242051Z 2023-01-11T22:10:55.2242363Z Running distributed/elastic/metrics/api_test ... [2023-01-11 22:10:55.223768] 2023-01-11T22:10:55.2243038Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/elastic/metrics/api_test.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:10:55.224010] 2023-01-11T22:10:59.0095946Z 2023-01-11T22:10:59.0096951Z Expand the folded group to see the log file of distributed/elastic/metrics/api_test 2023-01-11T22:10:59.0097918Z ##[group]PRINTING LOG FILE of distributed/elastic/metrics/api_test (/var/lib/jenkins/workspace/test/test-reports/distributed-elastic-metrics-api_test_9czh5snu) 2023-01-11T22:10:59.0098299Z 2023-01-11T22:10:59.0098420Z Running tests... 2023-01-11T22:10:59.0098959Z ---------------------------------------------------------------------- 2023-01-11T22:10:59.0099553Z Test results will be stored in test-reports/python-unittest/distributed.elastic.metrics.api_test 2023-01-11T22:10:59.0100254Z test_get_metric_name (__main__.MetricsApiTest) ... ok (1.632s) 2023-01-11T22:10:59.0100654Z test_inheritance (__main__.MetricsApiTest) ... ok (0.001s) 2023-01-11T22:10:59.0101014Z test_profile (__main__.MetricsApiTest) ... ok (0.002s) 2023-01-11T22:10:59.0101220Z 2023-01-11T22:10:59.0101494Z ---------------------------------------------------------------------- 2023-01-11T22:10:59.0101807Z Ran 3 tests in 1.635s 2023-01-11T22:10:59.0101973Z 2023-01-11T22:10:59.0102072Z OK 2023-01-11T22:10:59.0102207Z 2023-01-11T22:10:59.0102335Z Generating XML reports... 2023-01-11T22:10:59.0102923Z Generated XML report: test-reports/python-unittest/distributed.elastic.metrics.api_test/TEST-MetricsApiTest-20230111221056.xml 2023-01-11T22:10:59.0103280Z 2023-01-11T22:10:59.0103595Z ##[endgroup] 2023-01-11T22:10:59.0104212Z FINISHED PRINTING LOG FILE of distributed/elastic/metrics/api_test (/var/lib/jenkins/workspace/test/test-reports/distributed-elastic-metrics-api_test_9czh5snu) 2023-01-11T22:10:59.0104590Z 2023-01-11T22:10:59.0104874Z Running distributed/checkpoint/test_utils ... [2023-01-11 22:10:59.009584] 2023-01-11T22:10:59.0105534Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/checkpoint/test_utils.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:10:59.009857] 2023-01-11T22:11:02.9894953Z 2023-01-11T22:11:02.9895876Z Expand the folded group to see the log file of distributed/checkpoint/test_utils 2023-01-11T22:11:02.9897919Z ##[group]PRINTING LOG FILE of distributed/checkpoint/test_utils (/var/lib/jenkins/workspace/test/test-reports/distributed-checkpoint-test_utils_yjfsv_d8) 2023-01-11T22:11:02.9898733Z 2023-01-11T22:11:02.9898939Z Running tests... 2023-01-11T22:11:02.9899534Z ---------------------------------------------------------------------- 2023-01-11T22:11:02.9900098Z Test results will be stored in test-reports/python-unittest/distributed.checkpoint.test_utils 2023-01-11T22:11:02.9900894Z test_flat_data (__main__.TestMedatadaIndex) ... ok (1.660s) 2023-01-11T22:11:02.9901786Z test_index_hint_ignored_on_equals (__main__.TestMedatadaIndex) ... ok (0.001s) 2023-01-11T22:11:02.9902295Z test_index_hint_ignored_on_hash (__main__.TestMedatadaIndex) ... ok (0.001s) 2023-01-11T22:11:02.9902698Z test_init_convert_offset (__main__.TestMedatadaIndex) ... ok (0.001s) 2023-01-11T22:11:02.9903079Z test_sharded_tensor_lookup (__main__.TestMedatadaIndex) ... ok (0.003s) 2023-01-11T22:11:02.9903313Z 2023-01-11T22:11:02.9903595Z ---------------------------------------------------------------------- 2023-01-11T22:11:02.9903921Z Ran 5 tests in 1.666s 2023-01-11T22:11:02.9904083Z 2023-01-11T22:11:02.9904158Z OK 2023-01-11T22:11:02.9904293Z 2023-01-11T22:11:02.9904418Z Generating XML reports... 2023-01-11T22:11:02.9905030Z Generated XML report: test-reports/python-unittest/distributed.checkpoint.test_utils/TEST-TestMedatadaIndex-20230111221100.xml 2023-01-11T22:11:02.9905660Z 2023-01-11T22:11:02.9905962Z ##[endgroup] 2023-01-11T22:11:02.9906573Z FINISHED PRINTING LOG FILE of distributed/checkpoint/test_utils (/var/lib/jenkins/workspace/test/test-reports/distributed-checkpoint-test_utils_yjfsv_d8) 2023-01-11T22:11:02.9906936Z 2023-01-11T22:11:02.9907219Z Running distributed/checkpoint/test_nested_dict ... [2023-01-11 22:11:02.989469] 2023-01-11T22:11:02.9907923Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/checkpoint/test_nested_dict.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:11:02.989748] 2023-01-11T22:11:06.9515664Z 2023-01-11T22:11:06.9516538Z Expand the folded group to see the log file of distributed/checkpoint/test_nested_dict 2023-01-11T22:11:06.9518209Z ##[group]PRINTING LOG FILE of distributed/checkpoint/test_nested_dict (/var/lib/jenkins/workspace/test/test-reports/distributed-checkpoint-test_nested_dict_iycb6om7) 2023-01-11T22:11:06.9518652Z 2023-01-11T22:11:06.9518849Z Running tests... 2023-01-11T22:11:06.9519902Z ---------------------------------------------------------------------- 2023-01-11T22:11:06.9521201Z Test results will be stored in test-reports/python-unittest/distributed.checkpoint.test_nested_dict 2023-01-11T22:11:06.9522222Z test_flattening_round_trip (__main__.TestFlattening) ... ok (1.645s) 2023-01-11T22:11:06.9522601Z test_mapping (__main__.TestFlattening) ... ok (0.002s) 2023-01-11T22:11:06.9522810Z 2023-01-11T22:11:06.9523074Z ---------------------------------------------------------------------- 2023-01-11T22:11:06.9523398Z Ran 2 tests in 1.647s 2023-01-11T22:11:06.9523560Z 2023-01-11T22:11:06.9523656Z OK 2023-01-11T22:11:06.9523788Z 2023-01-11T22:11:06.9523912Z Generating XML reports... 2023-01-11T22:11:06.9524506Z Generated XML report: test-reports/python-unittest/distributed.checkpoint.test_nested_dict/TEST-TestFlattening-20230111221104.xml 2023-01-11T22:11:06.9524852Z 2023-01-11T22:11:06.9525168Z ##[endgroup] 2023-01-11T22:11:06.9525802Z FINISHED PRINTING LOG FILE of distributed/checkpoint/test_nested_dict (/var/lib/jenkins/workspace/test/test-reports/distributed-checkpoint-test_nested_dict_iycb6om7) 2023-01-11T22:11:06.9526160Z 2023-01-11T22:11:06.9526444Z Running distributed/elastic/utils/logging_test ... [2023-01-11 22:11:06.951561] 2023-01-11T22:11:06.9527138Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/elastic/utils/logging_test.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:11:06.951808] 2023-01-11T22:11:10.9117450Z 2023-01-11T22:11:10.9118321Z Expand the folded group to see the log file of distributed/elastic/utils/logging_test 2023-01-11T22:11:10.9120168Z ##[group]PRINTING LOG FILE of distributed/elastic/utils/logging_test (/var/lib/jenkins/workspace/test/test-reports/distributed-elastic-utils-logging_test_vt_9wwxq) 2023-01-11T22:11:10.9120892Z 2023-01-11T22:11:10.9121015Z Running tests... 2023-01-11T22:11:10.9121530Z ---------------------------------------------------------------------- 2023-01-11T22:11:10.9122129Z Test results will be stored in test-reports/python-unittest/distributed.elastic.utils.logging_test 2023-01-11T22:11:10.9122576Z test_derive_module_name (__main__.LoggingTest) ... ok (1.637s) 2023-01-11T22:11:10.9122932Z test_logger_name (__main__.LoggingTest) ... ok (0.002s) 2023-01-11T22:11:10.9123135Z 2023-01-11T22:11:10.9123401Z ---------------------------------------------------------------------- 2023-01-11T22:11:10.9123713Z Ran 2 tests in 1.639s 2023-01-11T22:11:10.9123892Z 2023-01-11T22:11:10.9123986Z OK 2023-01-11T22:11:10.9124120Z 2023-01-11T22:11:10.9124244Z Generating XML reports... 2023-01-11T22:11:10.9124845Z Generated XML report: test-reports/python-unittest/distributed.elastic.utils.logging_test/TEST-LoggingTest-20230111221108.xml 2023-01-11T22:11:10.9125196Z 2023-01-11T22:11:10.9125483Z ##[endgroup] 2023-01-11T22:11:10.9126102Z FINISHED PRINTING LOG FILE of distributed/elastic/utils/logging_test (/var/lib/jenkins/workspace/test/test-reports/distributed-elastic-utils-logging_test_vt_9wwxq) 2023-01-11T22:11:10.9126741Z 2023-01-11T22:11:10.9127023Z Running distributed/elastic/utils/util_test ... [2023-01-11 22:11:10.911759] 2023-01-11T22:11:10.9127681Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/elastic/utils/util_test.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:11:10.912001] 2023-01-11T22:11:14.8929203Z 2023-01-11T22:11:14.8930069Z Expand the folded group to see the log file of distributed/elastic/utils/util_test 2023-01-11T22:11:14.8931805Z ##[group]PRINTING LOG FILE of distributed/elastic/utils/util_test (/var/lib/jenkins/workspace/test/test-reports/distributed-elastic-utils-util_test_8na8zdc6) 2023-01-11T22:11:14.8932514Z 2023-01-11T22:11:14.8932711Z Running tests... 2023-01-11T22:11:14.8933585Z ---------------------------------------------------------------------- 2023-01-11T22:11:14.8934628Z Test results will be stored in test-reports/python-unittest/distributed.elastic.utils.util_test 2023-01-11T22:11:14.8935487Z test_get_all_rank_0 (__main__.StoreUtilTest) ... ok (1.640s) 2023-01-11T22:11:14.8936128Z test_get_all_rank_n (__main__.StoreUtilTest) ... ok (0.002s) 2023-01-11T22:11:14.8937463Z test_synchronize (__main__.StoreUtilTest) ... ok (0.003s) 2023-01-11T22:11:14.8937834Z test_get_logger (__main__.UtilTest) ... ok (0.104s) 2023-01-11T22:11:14.8938182Z test_get_logger_custom_name (__main__.UtilTest) ... ok (0.001s) 2023-01-11T22:11:14.8938545Z test_get_logger_different (__main__.UtilTest) ... ok (0.001s) 2023-01-11T22:11:14.8938901Z test_get_logger_none (__main__.UtilTest) ... ok (0.001s) 2023-01-11T22:11:14.8939085Z 2023-01-11T22:11:14.8939377Z ---------------------------------------------------------------------- 2023-01-11T22:11:14.8939807Z Ran 7 tests in 1.752s 2023-01-11T22:11:14.8940101Z 2023-01-11T22:11:14.8940251Z OK 2023-01-11T22:11:14.8940473Z 2023-01-11T22:11:14.8940655Z Generating XML reports... 2023-01-11T22:11:14.8941651Z Generated XML report: test-reports/python-unittest/distributed.elastic.utils.util_test/TEST-StoreUtilTest-20230111221112.xml 2023-01-11T22:11:14.8942881Z Generated XML report: test-reports/python-unittest/distributed.elastic.utils.util_test/TEST-UtilTest-20230111221112.xml 2023-01-11T22:11:14.8943393Z 2023-01-11T22:11:14.8943889Z ##[endgroup] 2023-01-11T22:11:14.8944824Z FINISHED PRINTING LOG FILE of distributed/elastic/utils/util_test (/var/lib/jenkins/workspace/test/test-reports/distributed-elastic-utils-util_test_8na8zdc6) 2023-01-11T22:11:14.8945383Z 2023-01-11T22:11:14.8945840Z Running distributed/test_multi_threaded_pg ... [2023-01-11 22:11:14.892914] 2023-01-11T22:11:14.8947146Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/test_multi_threaded_pg.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:11:14.893244] 2023-01-11T22:11:19.0174191Z 2023-01-11T22:11:19.0174701Z Expand the folded group to see the log file of distributed/test_multi_threaded_pg 2023-01-11T22:11:19.0175629Z ##[group]PRINTING LOG FILE of distributed/test_multi_threaded_pg (/var/lib/jenkins/workspace/test/test-reports/distributed-test_multi_threaded_pg_huzd5van) 2023-01-11T22:11:19.0176006Z 2023-01-11T22:11:19.0176119Z Running tests... 2023-01-11T22:11:19.0177292Z ---------------------------------------------------------------------- 2023-01-11T22:11:19.0178337Z Test results will be stored in test-reports/python-unittest/distributed.test_multi_threaded_pg 2023-01-11T22:11:19.0179508Z test_all_reduce (__main__.TestCollectivesWithBaseClass) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:11:19.0180620Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:11:19.0181234Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:11:19.0181715Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:11:19.0182175Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:11:19.0183116Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0183797Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0184454Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0185119Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0185507Z ok (1.697s) 2023-01-11T22:11:19.0185964Z test_allgather (__main__.TestCollectivesWithBaseClass) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:11:19.0186522Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:11:19.0187008Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:11:19.0187753Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0188289Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:11:19.0188915Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0189589Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0190262Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0190639Z ok (0.019s) 2023-01-11T22:11:19.0191086Z test_assert_equal_on_rank (__main__.TestCollectivesWithBaseClass) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:11:19.0191673Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:11:19.0192156Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:11:19.0192613Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:11:19.0193253Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0193923Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0194585Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0195231Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0195613Z ok (0.014s) 2023-01-11T22:11:19.0196073Z test_broadcast (__main__.TestCollectivesWithBaseClass) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:11:19.0196642Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:11:19.0197105Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:11:19.0197581Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:11:19.0198222Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0198887Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0199605Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0200275Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0200658Z ok (0.020s) 2023-01-11T22:11:19.0201111Z test_broadcast_object_list (__main__.TestCollectivesWithBaseClass) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:11:19.0201691Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:11:19.0202171Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:11:19.0202647Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:11:19.0203269Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0203939Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0204659Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0205336Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0205723Z 3 -> 4 2023-01-11T22:11:19.0205957Z 0 -> 4 2023-01-11T22:11:19.0206187Z 1 -> 4 2023-01-11T22:11:19.0206402Z 2 -> 4 2023-01-11T22:11:19.0206614Z ok (0.016s) 2023-01-11T22:11:19.0207077Z test_reduce_scatter (__main__.TestCollectivesWithBaseClass) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:11:19.0207637Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:11:19.0208124Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:11:19.0208601Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:11:19.0209313Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0209969Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0210633Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0211304Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0211686Z ok (0.015s) 2023-01-11T22:11:19.0212121Z test_scatter (__main__.TestCollectivesWithBaseClass) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:11:19.0212692Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:11:19.0213172Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:11:19.0213646Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:11:19.0214265Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0214928Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0215594Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0216255Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0216915Z ok (0.014s) 2023-01-11T22:11:19.0217395Z test_broadcast_object_list (__main__.TestCollectivesWithWrapper) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:11:19.0217981Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:11:19.0218444Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:11:19.0218921Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:11:19.0219568Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0220235Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0220882Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0221635Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0222034Z ok (0.016s) 2023-01-11T22:11:19.0222517Z test_collective_error_on_rank_non_zero (__main__.TestCollectivesWithWrapper) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:11:19.0223094Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:11:19.0223571Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:11:19.0224218Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0224738Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:11:19.0225363Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0226030Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0226700Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0227162Z ERROR:torch.testing._internal.common_distributed:Caught exception: 2023-01-11T22:11:19.0227513Z Traceback (most recent call last): 2023-01-11T22:11:19.0228060Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/distributed/multi_threaded_pg.py", line 365, in worker 2023-01-11T22:11:19.0228442Z callback() 2023-01-11T22:11:19.0228917Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 896, in 2023-01-11T22:11:19.0229364Z "runTest", timeout, world_size, lambda: func(self, *args, **kwargs) 2023-01-11T22:11:19.0229799Z File "/var/lib/jenkins/workspace/test/distributed/test_multi_threaded_pg.py", line 57, in _test_method 2023-01-11T22:11:19.0230217Z raise AssertionError("Mimic real test failure.") # fail on rank 1 2023-01-11T22:11:19.0230565Z AssertionError: Mimic real test failure. 2023-01-11T22:11:19.0230846Z exiting thread 1 2023-01-11T22:11:19.0231082Z ok (0.013s) 2023-01-11T22:11:19.0231554Z test_collective_error_on_rank_non_zero_all (__main__.TestCollectivesWithWrapper) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:11:19.0232155Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:11:19.0232639Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:11:19.0233270Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0233883Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:11:19.0234524Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0235189Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0235844Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0236323Z ERROR:torch.testing._internal.common_distributed:Caught exception: 2023-01-11T22:11:19.0236672Z Traceback (most recent call last): 2023-01-11T22:11:19.0237217Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/distributed/multi_threaded_pg.py", line 365, in worker 2023-01-11T22:11:19.0237582Z callback() 2023-01-11T22:11:19.0238073Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 896, in 2023-01-11T22:11:19.0238514Z "runTest", timeout, world_size, lambda: func(self, *args, **kwargs) 2023-01-11T22:11:19.0238986Z File "/var/lib/jenkins/workspace/test/distributed/test_multi_threaded_pg.py", line 72, in _test_method 2023-01-11T22:11:19.0239525Z raise AssertionError("Mimic real test failure.") # fail on all non-zero rank 2023-01-11T22:11:19.0239884Z AssertionError: Mimic real test failure. 2023-01-11T22:11:19.0240164Z exiting thread 2 2023-01-11T22:11:19.0240385Z ok (0.013s) 2023-01-11T22:11:19.0240863Z test_collective_error_on_rank_zero (__main__.TestCollectivesWithWrapper) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:11:19.0241460Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:11:19.0241926Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:11:19.0242573Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0243098Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:11:19.0243734Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0244385Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0245054Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0245532Z ERROR:torch.testing._internal.common_distributed:Caught exception: 2023-01-11T22:11:19.0245886Z Traceback (most recent call last): 2023-01-11T22:11:19.0246410Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/distributed/multi_threaded_pg.py", line 365, in worker 2023-01-11T22:11:19.0246796Z callback() 2023-01-11T22:11:19.0247282Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 896, in 2023-01-11T22:11:19.0247710Z "runTest", timeout, world_size, lambda: func(self, *args, **kwargs) 2023-01-11T22:11:19.0248143Z File "/var/lib/jenkins/workspace/test/distributed/test_multi_threaded_pg.py", line 42, in _test_method 2023-01-11T22:11:19.0248572Z raise AssertionError("Mimic real test failure.") # fail on rank 0 2023-01-11T22:11:19.0248900Z AssertionError: Mimic real test failure. 2023-01-11T22:11:19.0249183Z exiting thread 0 2023-01-11T22:11:19.0249418Z ok (0.013s) 2023-01-11T22:11:19.0249866Z test_skip (__main__.TestCollectivesWithWrapper) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:11:19.0250415Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:11:19.0250961Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:11:19.0251441Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:11:19.0252067Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0252747Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0253417Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0254112Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:11:19.0254705Z INFO:torch.testing._internal.common_distributed:Thread 3 skipping test runTest for following reason: check if skip exception can be captured correctly. 2023-01-11T22:11:19.0255369Z INFO:torch.testing._internal.common_distributed:Thread 0 skipping test runTest for following reason: check if skip exception can be captured correctly. 2023-01-11T22:11:19.0255988Z INFO:torch.testing._internal.common_distributed:Thread 1 skipping test runTest for following reason: check if skip exception can be captured correctly. 2023-01-11T22:11:19.0256845Z INFO:torch.testing._internal.common_distributed:Thread 2 skipping test runTest for following reason: check if skip exception can be captured correctly. 2023-01-11T22:11:19.0257264Z ok (0.012s) 2023-01-11T22:11:19.0257398Z 2023-01-11T22:11:19.0257679Z ---------------------------------------------------------------------- 2023-01-11T22:11:19.0258006Z Ran 12 tests in 1.864s 2023-01-11T22:11:19.0258167Z 2023-01-11T22:11:19.0258258Z OK 2023-01-11T22:11:19.0258391Z 2023-01-11T22:11:19.0258497Z Generating XML reports... 2023-01-11T22:11:19.0259135Z Generated XML report: test-reports/python-unittest/distributed.test_multi_threaded_pg/TEST-TestCollectivesWithBaseClass-20230111221116.xml 2023-01-11T22:11:19.0259962Z Generated XML report: test-reports/python-unittest/distributed.test_multi_threaded_pg/TEST-TestCollectivesWithWrapper-20230111221116.xml 2023-01-11T22:11:19.0260334Z 2023-01-11T22:11:19.0260642Z ##[endgroup] 2023-01-11T22:11:19.0261236Z FINISHED PRINTING LOG FILE of distributed/test_multi_threaded_pg (/var/lib/jenkins/workspace/test/test-reports/distributed-test_multi_threaded_pg_huzd5van) 2023-01-11T22:11:19.0261581Z 2023-01-11T22:11:19.0261848Z Running distributed/rpc/test_share_memory ... [2023-01-11 22:11:19.017438] 2023-01-11T22:11:19.0262546Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/rpc/test_share_memory.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:11:19.017719] 2023-01-11T22:11:27.5701309Z 2023-01-11T22:11:27.5702084Z Expand the folded group to see the log file of distributed/rpc/test_share_memory 2023-01-11T22:11:27.5703585Z ##[group]PRINTING LOG FILE of distributed/rpc/test_share_memory (/var/lib/jenkins/workspace/test/test-reports/distributed-rpc-test_share_memory_7184pciy) 2023-01-11T22:11:27.5704104Z 2023-01-11T22:11:27.5704491Z ]> 2023-01-11T22:11:27.5704871Z test_case (__main__.TestRPCPickler) 2023-01-11T22:11:27.5705535Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:11:27.5705993Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:11:27.5706554Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:11:27.5707020Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:11:27.5707251Z 2023-01-11T22:11:27.5707648Z Running tests... 2023-01-11T22:11:27.5708043Z ---------------------------------------------------------------------- 2023-01-11T22:11:27.5708589Z Test results will be stored in test-reports/python-unittest/distributed.rpc.test_share_memory 2023-01-11T22:11:27.5709349Z test_case (__main__.TestRPCPickler) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:11:27.5709843Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:11:27.5710398Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:11:27.5710861Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:11:27.5711923Z /opt/conda/lib/python3.10/site-packages/torch/multiprocessing/reductions.py:355: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:11:27.5712623Z if storage.is_cuda: 2023-01-11T22:11:27.5712851Z ok (4.285s) 2023-01-11T22:11:27.5713113Z 2023-01-11T22:11:27.5713398Z ---------------------------------------------------------------------- 2023-01-11T22:11:27.5713745Z Ran 1 test in 4.285s 2023-01-11T22:11:27.5713906Z 2023-01-11T22:11:27.5714000Z OK 2023-01-11T22:11:27.5714116Z 2023-01-11T22:11:27.5714240Z Generating XML reports... 2023-01-11T22:11:27.5714816Z Generated XML report: test-reports/python-unittest/distributed.rpc.test_share_memory/TEST-TestRPCPickler-20230111221122.xml 2023-01-11T22:11:27.5715151Z 2023-01-11T22:11:27.5715472Z ##[endgroup] 2023-01-11T22:11:27.5716042Z FINISHED PRINTING LOG FILE of distributed/rpc/test_share_memory (/var/lib/jenkins/workspace/test/test-reports/distributed-rpc-test_share_memory_7184pciy) 2023-01-11T22:11:27.5716389Z 2023-01-11T22:11:27.5716683Z Running distributed/elastic/utils/distributed_test ... [2023-01-11 22:11:27.570160] 2023-01-11T22:11:27.5734296Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/elastic/utils/distributed_test.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:11:27.570414] 2023-01-11T22:11:34.6798193Z 2023-01-11T22:11:34.6798705Z Expand the folded group to see the log file of distributed/elastic/utils/distributed_test 2023-01-11T22:11:34.6799705Z ##[group]PRINTING LOG FILE of distributed/elastic/utils/distributed_test (/var/lib/jenkins/workspace/test/test-reports/distributed-elastic-utils-distributed_test_ayg1bdov) 2023-01-11T22:11:34.6800108Z 2023-01-11T22:11:34.6800225Z Running tests... 2023-01-11T22:11:34.6800722Z ---------------------------------------------------------------------- 2023-01-11T22:11:34.6801312Z Test results will be stored in test-reports/python-unittest/distributed.elastic.utils.distributed_test 2023-01-11T22:11:34.6801802Z test_create_store_multi (__main__.DistributedUtilTest) ... ok (1.686s) 2023-01-11T22:11:34.6802232Z test_create_store_no_port_multi (__main__.DistributedUtilTest) ... ok (0.001s) 2023-01-11T22:11:34.6802642Z test_create_store_single_server (__main__.DistributedUtilTest) ... ok (0.004s) 2023-01-11T22:11:34.6803064Z test_create_store_timeout_on_server (__main__.DistributedUtilTest) ... ok (3.019s) 2023-01-11T22:11:34.6803611Z test_create_store_timeout_on_worker (__main__.DistributedUtilTest) ... [E socket.cpp:860] [c10d] The client socket has timed out after 1s while trying to connect to (7c5487d9c02b, 0). 2023-01-11T22:11:34.6804014Z ok (0.001s) 2023-01-11T22:11:34.6804676Z test_port_already_in_use_on_server (__main__.DistributedUtilTest) ... [W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:42713 (errno: 98 - Address already in use). 2023-01-11T22:11:34.6805345Z [W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:42713 (errno: 98 - Address already in use). 2023-01-11T22:11:34.6805800Z [E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address. 2023-01-11T22:11:34.6806390Z ok (0.004s) 2023-01-11T22:11:34.6806827Z test_port_already_in_use_on_worker (__main__.DistributedUtilTest) ... [E socket.cpp:860] [c10d] The client socket has timed out after 1s while trying to connect to (7c5487d9c02b, 60761). 2023-01-11T22:11:34.6807250Z ok (0.001s) 2023-01-11T22:11:34.6807395Z 2023-01-11T22:11:34.6807666Z ---------------------------------------------------------------------- 2023-01-11T22:11:34.6807975Z Ran 7 tests in 4.716s 2023-01-11T22:11:34.6808137Z 2023-01-11T22:11:34.6808230Z OK 2023-01-11T22:11:34.6808366Z 2023-01-11T22:11:34.6808490Z Generating XML reports... 2023-01-11T22:11:34.6809107Z Generated XML report: test-reports/python-unittest/distributed.elastic.utils.distributed_test/TEST-DistributedUtilTest-20230111221129.xml 2023-01-11T22:11:34.6809555Z 2023-01-11T22:11:34.6809863Z ##[endgroup] 2023-01-11T22:11:34.6810511Z FINISHED PRINTING LOG FILE of distributed/elastic/utils/distributed_test (/var/lib/jenkins/workspace/test/test-reports/distributed-elastic-utils-distributed_test_ayg1bdov) 2023-01-11T22:11:34.6810904Z 2023-01-11T22:11:34.6811299Z Running distributed/elastic/timer/local_timer_test ... [2023-01-11 22:11:34.679834] 2023-01-11T22:11:34.6811995Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/elastic/timer/local_timer_test.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:11:34.680082] 2023-01-11T22:11:42.8492949Z 2023-01-11T22:11:42.8493448Z Expand the folded group to see the log file of distributed/elastic/timer/local_timer_test 2023-01-11T22:11:42.8494922Z ##[group]PRINTING LOG FILE of distributed/elastic/timer/local_timer_test (/var/lib/jenkins/workspace/test/test-reports/distributed-elastic-timer-local_timer_test_uson_aby) 2023-01-11T22:11:42.8495436Z 2023-01-11T22:11:42.8495564Z Running tests... 2023-01-11T22:11:42.8496075Z ---------------------------------------------------------------------- 2023-01-11T22:11:42.8497112Z Test results will be stored in test-reports/python-unittest/distributed.elastic.timer.local_timer_test 2023-01-11T22:11:42.8497793Z test_acquire_release (__main__.LocalTimerServerTest) 2023-01-11T22:11:42.8498124Z tests that: ... ok (1.607s) 2023-01-11T22:11:42.8498445Z test_expired_timers (__main__.LocalTimerServerTest) 2023-01-11T22:11:42.8498819Z tests that a single expired timer on a process should terminate ... ok (0.002s) 2023-01-11T22:11:42.8499215Z test_valid_timers (__main__.LocalTimerServerTest) 2023-01-11T22:11:42.8499621Z tests that valid timers are processed correctly and the process is left alone ... ok (0.003s) 2023-01-11T22:11:42.8500035Z test_watchdog_call_count (__main__.LocalTimerServerTest) 2023-01-11T22:11:42.8500549Z checks that the watchdog function ran wait/interval +- 1 times ... ok (0.104s) 2023-01-11T22:11:42.8500946Z test_watchdog_empty_queue (__main__.LocalTimerServerTest) 2023-01-11T22:11:42.8501323Z checks that the watchdog can run on an empty queue ... ok (0.011s) 2023-01-11T22:11:42.8501708Z test_client_interaction (__main__.LocalTimerTest) ... ok (0.003s) 2023-01-11T22:11:42.8502079Z test_exception_propagation (__main__.LocalTimerTest) ... ok (0.011s) 2023-01-11T22:11:42.8502457Z test_get_timer_recursive (__main__.LocalTimerTest) 2023-01-11T22:11:42.8503140Z If a function acquires a countdown timer with default scope, ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:11:42.8503634Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:11:42.8504200Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:11:42.8504664Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:11:42.8504969Z ok (2.357s) 2023-01-11T22:11:42.8505246Z test_happy_path (__main__.LocalTimerTest) ... ok (0.103s) 2023-01-11T22:11:42.8505597Z test_no_client (__main__.LocalTimerTest) ... ok (0.011s) 2023-01-11T22:11:42.8506209Z test_timer (__main__.LocalTimerTest) ... ok (0.155s) 2023-01-11T22:11:42.8506573Z test_get (__main__.MultiprocessingRequestQueueTest) ... ok (0.023s) 2023-01-11T22:11:42.8507008Z test_get_less_than_size (__main__.MultiprocessingRequestQueueTest) 2023-01-11T22:11:42.8507364Z Tests slow producer. ... ok (0.516s) 2023-01-11T22:11:42.8507696Z test_get_size (__main__.MultiprocessingRequestQueueTest) 2023-01-11T22:11:42.8508092Z Creates a "producer" process that enqueues ``n`` elements ... ok (0.923s) 2023-01-11T22:11:42.8508323Z 2023-01-11T22:11:42.8508600Z ---------------------------------------------------------------------- 2023-01-11T22:11:42.8508927Z Ran 14 tests in 5.832s 2023-01-11T22:11:42.8509069Z 2023-01-11T22:11:42.8509162Z OK 2023-01-11T22:11:42.8509294Z 2023-01-11T22:11:42.8509417Z Generating XML reports... 2023-01-11T22:11:42.8510104Z Generated XML report: test-reports/python-unittest/distributed.elastic.timer.local_timer_test/TEST-LocalTimerServerTest-20230111221136.xml 2023-01-11T22:11:42.8510890Z Generated XML report: test-reports/python-unittest/distributed.elastic.timer.local_timer_test/TEST-LocalTimerTest-20230111221136.xml 2023-01-11T22:11:42.8511829Z Generated XML report: test-reports/python-unittest/distributed.elastic.timer.local_timer_test/TEST-MultiprocessingRequestQueueTest-20230111221136.xml 2023-01-11T22:11:42.8512254Z 2023-01-11T22:11:42.8512577Z ##[endgroup] 2023-01-11T22:11:42.8513216Z FINISHED PRINTING LOG FILE of distributed/elastic/timer/local_timer_test (/var/lib/jenkins/workspace/test/test-reports/distributed-elastic-timer-local_timer_test_uson_aby) 2023-01-11T22:11:42.8513578Z 2023-01-11T22:11:42.8513874Z Running distributed/fsdp/test_fsdp_multiple_forward ... [2023-01-11 22:11:42.849287] 2023-01-11T22:11:42.8514579Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_multiple_forward.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:11:42.849543] 2023-01-11T22:11:51.2767595Z 2023-01-11T22:11:51.2768120Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_multiple_forward 2023-01-11T22:11:51.2769384Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_multiple_forward (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_multiple_forward_dr7l489z) 2023-01-11T22:11:51.2769819Z 2023-01-11T22:11:51.2769934Z Running tests... 2023-01-11T22:11:51.2770431Z ---------------------------------------------------------------------- 2023-01-11T22:11:51.2771005Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_multiple_forward 2023-01-11T22:11:51.2771795Z test_multi_forward (__main__.TestMultiForward) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:11:51.2772506Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50919 2023-01-11T22:11:51.2772957Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50920 2023-01-11T22:11:51.2773601Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:11:51.2774047Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:11:51.2774602Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:11:51.2775067Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:11:51.2775635Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:11:51.2776056Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:11:51.2776892Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:11:51.2777371Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:11:51.2777822Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:11:51.2778565Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:11:51.2779234Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:11:51.2779916Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:11:51.2780429Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:11:51.2780877Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:11:51.2781341Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:11:51.2781843Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:11:51.2783204Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:11:51.2783998Z warnings.warn( 2023-01-11T22:11:51.2785131Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:11:51.2785892Z warnings.warn( 2023-01-11T22:11:51.2786144Z dist init r=1, world=2 2023-01-11T22:11:51.2786395Z dist init r=0, world=2 2023-01-11T22:11:51.2786612Z ok (6.066s) 2023-01-11T22:11:51.2786758Z 2023-01-11T22:11:51.2787033Z ---------------------------------------------------------------------- 2023-01-11T22:11:51.2787357Z Ran 1 test in 6.066s 2023-01-11T22:11:51.2787515Z 2023-01-11T22:11:51.2787590Z OK 2023-01-11T22:11:51.2787723Z 2023-01-11T22:11:51.2787849Z Generating XML reports... 2023-01-11T22:11:51.2788460Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_multiple_forward/TEST-TestMultiForward-20230111221144.xml 2023-01-11T22:11:51.2788815Z 2023-01-11T22:11:51.2789115Z ##[endgroup] 2023-01-11T22:11:51.2789749Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_multiple_forward (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_multiple_forward_dr7l489z) 2023-01-11T22:11:51.2790123Z 2023-01-11T22:11:51.2790424Z Running distributed/_shard/sharded_tensor/ops/test_softmax ... [2023-01-11 22:11:51.276765] 2023-01-11T22:11:51.2791148Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_shard/sharded_tensor/ops/test_softmax.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:11:51.277009] 2023-01-11T22:11:59.8544425Z 2023-01-11T22:11:59.8545222Z Expand the folded group to see the log file of distributed/_shard/sharded_tensor/ops/test_softmax 2023-01-11T22:11:59.8546199Z ##[group]PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_softmax (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_softmax_rozz9xdq) 2023-01-11T22:11:59.8546865Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9hdmrg1c 2023-01-11T22:11:59.8547413Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9hdmrg1c/_remote_module_non_scriptable.py 2023-01-11T22:11:59.8547719Z 2023-01-11T22:11:59.8547827Z Running tests... 2023-01-11T22:11:59.8548316Z ---------------------------------------------------------------------- 2023-01-11T22:11:59.8549188Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_softmax 2023-01-11T22:11:59.8549725Z test_sharded_softmax_basic (__main__.TestShardedSoftmax) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:11:59.8550187Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51037 2023-01-11T22:11:59.8550633Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51038 2023-01-11T22:11:59.8551084Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 51039 2023-01-11T22:11:59.8551515Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 51040 2023-01-11T22:11:59.8552130Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:11:59.8552575Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:11:59.8553154Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:11:59.8553609Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:11:59.8554286Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:11:59.8554743Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:11:59.8555316Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:11:59.8555762Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:11:59.8556335Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:11:59.8556774Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:11:59.8557348Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:11:59.8557797Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:11:59.8558374Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:11:59.8558816Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:11:59.8559380Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:11:59.8559865Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:11:59.8560328Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp8za48ub 2023-01-11T22:11:59.8560864Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp8za48ub/_remote_module_non_scriptable.py 2023-01-11T22:11:59.8561372Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_u1k60ut 2023-01-11T22:11:59.8561901Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_u1k60ut/_remote_module_non_scriptable.py 2023-01-11T22:11:59.8562439Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:11:59.8562934Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_rbgc4my 2023-01-11T22:11:59.8563444Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_rbgc4my/_remote_module_non_scriptable.py 2023-01-11T22:11:59.8563944Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:11:59.8564434Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvt8ksxk1 2023-01-11T22:11:59.8564970Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvt8ksxk1/_remote_module_non_scriptable.py 2023-01-11T22:11:59.8565451Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:11:59.8565912Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:11:59.8566376Z skip: Need at least 4 CUDA devices (3.935s) 2023-01-11T22:11:59.8566852Z test_sharded_softmax_on_sharding_dim (__main__.TestShardedSoftmax) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51173 2023-01-11T22:11:59.8567390Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51174 2023-01-11T22:11:59.8567835Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 51175 2023-01-11T22:11:59.8568276Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 51176 2023-01-11T22:11:59.8568874Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:11:59.8569320Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:11:59.8569891Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:11:59.8570360Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:11:59.8570975Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:11:59.8571425Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:11:59.8571995Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:11:59.8572439Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:11:59.8573007Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:11:59.8573442Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:11:59.8573996Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:11:59.8574424Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:11:59.8574985Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:11:59.8575449Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:11:59.8576016Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:11:59.8576477Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:11:59.8577251Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3t7jkxk6 2023-01-11T22:11:59.8577793Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3t7jkxk6/_remote_module_non_scriptable.py 2023-01-11T22:11:59.8578287Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:11:59.8578786Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz4xp2w5k 2023-01-11T22:11:59.8579327Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz4xp2w5k/_remote_module_non_scriptable.py 2023-01-11T22:11:59.8579834Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:11:59.8580311Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpctnchsve 2023-01-11T22:11:59.8580842Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpctnchsve/_remote_module_non_scriptable.py 2023-01-11T22:11:59.8581368Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1hyo7fjg 2023-01-11T22:11:59.8581874Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1hyo7fjg/_remote_module_non_scriptable.py 2023-01-11T22:11:59.8582377Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:11:59.8582844Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:11:59.8583352Z skip: Need at least 4 CUDA devices (2.409s) 2023-01-11T22:11:59.8583528Z 2023-01-11T22:11:59.8583816Z ---------------------------------------------------------------------- 2023-01-11T22:11:59.8584147Z Ran 2 tests in 6.345s 2023-01-11T22:11:59.8584310Z 2023-01-11T22:11:59.8584418Z OK (skipped=2) 2023-01-11T22:11:59.8584572Z 2023-01-11T22:11:59.8584679Z Generating XML reports... 2023-01-11T22:11:59.8585305Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_softmax/TEST-TestShardedSoftmax-20230111221153.xml 2023-01-11T22:11:59.8585673Z 2023-01-11T22:11:59.8585991Z ##[endgroup] 2023-01-11T22:11:59.8586636Z FINISHED PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_softmax (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_softmax_rozz9xdq) 2023-01-11T22:11:59.8587031Z 2023-01-11T22:11:59.8587337Z Running distributed/_shard/sharded_tensor/ops/test_embedding ... [2023-01-11 22:11:59.854507] 2023-01-11T22:11:59.8588071Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_shard/sharded_tensor/ops/test_embedding.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:11:59.854792] 2023-01-11T22:12:08.5273373Z 2023-01-11T22:12:08.5273880Z Expand the folded group to see the log file of distributed/_shard/sharded_tensor/ops/test_embedding 2023-01-11T22:12:08.5275063Z ##[group]PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_embedding (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_embedding_fgpxym7a) 2023-01-11T22:12:08.5275850Z 2023-01-11T22:12:08.5276024Z Running tests... 2023-01-11T22:12:08.5276536Z ---------------------------------------------------------------------- 2023-01-11T22:12:08.5277134Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_embedding 2023-01-11T22:12:08.5277666Z test_sharded_embedding_colwise (__main__.TestShardedEmbedding) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:08.5278172Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51344 2023-01-11T22:12:08.5278625Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51345 2023-01-11T22:12:08.5279069Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 51346 2023-01-11T22:12:08.5279491Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 51347 2023-01-11T22:12:08.5280112Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:08.5280565Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:08.5281121Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:08.5281590Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:08.5282162Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:08.5282611Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:08.5283163Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:08.5283623Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:08.5284198Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:08.5284619Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:08.5285184Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:08.5285646Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:08.5286213Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:08.5286783Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:08.5287360Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:08.5287819Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:08.5288252Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:12:08.5288706Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:08.5289165Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:12:08.5289631Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:08.5290004Z skip: Need at least 4 CUDA devices (4.069s) 2023-01-11T22:12:08.5290494Z test_sharded_embedding_rowwise (__main__.TestShardedEmbedding) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51480 2023-01-11T22:12:08.5291039Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51481 2023-01-11T22:12:08.5291566Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 51482 2023-01-11T22:12:08.5292002Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 51483 2023-01-11T22:12:08.5292608Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:08.5293052Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:08.5293588Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:08.5294032Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:08.5294599Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:08.5295067Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:08.5295631Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:08.5296091Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:08.5296954Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:08.5297411Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:08.5297973Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:08.5298431Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:08.5299003Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:08.5299429Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:08.5299995Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:08.5300457Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:08.5300891Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:12:08.5301344Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:12:08.5301806Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:08.5302269Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:08.5302640Z skip: Need at least 4 CUDA devices (2.310s) 2023-01-11T22:12:08.5302834Z 2023-01-11T22:12:08.5303106Z ---------------------------------------------------------------------- 2023-01-11T22:12:08.5303566Z Ran 2 tests in 6.379s 2023-01-11T22:12:08.5303730Z 2023-01-11T22:12:08.5303838Z OK (skipped=2) 2023-01-11T22:12:08.5303974Z 2023-01-11T22:12:08.5304100Z Generating XML reports... 2023-01-11T22:12:08.5304749Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_embedding/TEST-TestShardedEmbedding-20230111221201.xml 2023-01-11T22:12:08.5305130Z 2023-01-11T22:12:08.5305448Z ##[endgroup] 2023-01-11T22:12:08.5306104Z FINISHED PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_embedding (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_embedding_fgpxym7a) 2023-01-11T22:12:08.5306500Z 2023-01-11T22:12:08.5306783Z Running distributed/test_c10d_error_logger ... [2023-01-11 22:12:08.527378] 2023-01-11T22:12:08.5307458Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/test_c10d_error_logger.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:12:08.527636] 2023-01-11T22:12:18.6109094Z 2023-01-11T22:12:18.6109558Z Expand the folded group to see the log file of distributed/test_c10d_error_logger 2023-01-11T22:12:18.6110774Z ##[group]PRINTING LOG FILE of distributed/test_c10d_error_logger (/var/lib/jenkins/workspace/test/test-reports/distributed-test_c10d_error_logger_ztg95o7z) 2023-01-11T22:12:18.6111217Z 2023-01-11T22:12:18.6111333Z Running tests... 2023-01-11T22:12:18.6111850Z ---------------------------------------------------------------------- 2023-01-11T22:12:18.6112397Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_error_logger 2023-01-11T22:12:18.6112917Z test_exception_handler_with_dist (__main__.C10dErrorLoggerTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:18.6113406Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51651 2023-01-11T22:12:18.6113887Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51652 2023-01-11T22:12:18.6114487Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:18.6114950Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:18.6115524Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:18.6115992Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:18.6116548Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:18.6116987Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:18.6117553Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:18.6117996Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:18.6118428Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:18.6118921Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:18.6119401Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:18.6119868Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:18.6120519Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:18.6121193Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:18.6121562Z ok (5.499s) 2023-01-11T22:12:18.6121992Z test_get_or_create_logger (__main__.C10dErrorLoggerTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51730 2023-01-11T22:12:18.6122511Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51731 2023-01-11T22:12:18.6123263Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:18.6123696Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:18.6124269Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:18.6124739Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:18.6125313Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:18.6125735Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:18.6126300Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:18.6126759Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:18.6127173Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:18.6127644Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:18.6127981Z ok (2.307s) 2023-01-11T22:12:18.6128185Z 2023-01-11T22:12:18.6128467Z ---------------------------------------------------------------------- 2023-01-11T22:12:18.6128777Z Ran 2 tests in 7.806s 2023-01-11T22:12:18.6128938Z 2023-01-11T22:12:18.6129032Z OK 2023-01-11T22:12:18.6129165Z 2023-01-11T22:12:18.6129290Z Generating XML reports... 2023-01-11T22:12:18.6129863Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_error_logger/TEST-C10dErrorLoggerTest-20230111221210.xml 2023-01-11T22:12:18.6130209Z 2023-01-11T22:12:18.6130516Z ##[endgroup] 2023-01-11T22:12:18.6131105Z FINISHED PRINTING LOG FILE of distributed/test_c10d_error_logger (/var/lib/jenkins/workspace/test/test-reports/distributed-test_c10d_error_logger_ztg95o7z) 2023-01-11T22:12:18.6131449Z 2023-01-11T22:12:18.6131735Z Running distributed/_shard/sharded_tensor/ops/test_linear ... [2023-01-11 22:12:18.610957] 2023-01-11T22:12:18.6132465Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_shard/sharded_tensor/ops/test_linear.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:12:18.611221] 2023-01-11T22:12:29.7450322Z 2023-01-11T22:12:29.7451076Z Expand the folded group to see the log file of distributed/_shard/sharded_tensor/ops/test_linear 2023-01-11T22:12:29.7452289Z ##[group]PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_linear (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_linear_o48tmlqt) 2023-01-11T22:12:29.7452701Z 2023-01-11T22:12:29.7452817Z Running tests... 2023-01-11T22:12:29.7453316Z ---------------------------------------------------------------------- 2023-01-11T22:12:29.7453884Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_linear 2023-01-11T22:12:29.7454465Z test_sharded_linear_colwise (__main__.TestShardedTensorOpsLinear) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:29.7454970Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51833 2023-01-11T22:12:29.7455425Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51834 2023-01-11T22:12:29.7456114Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 51835 2023-01-11T22:12:29.7457441Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 51836 2023-01-11T22:12:29.7458777Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:29.7459791Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:29.7460535Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:29.7461015Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:29.7461859Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:29.7462295Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:29.7462866Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:29.7463334Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:29.7463906Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:29.7464328Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:29.7464945Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:29.7465410Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:29.7465989Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:29.7466435Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:29.7467079Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:29.7467560Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:29.7467998Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:12:29.7468454Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:29.7468913Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:29.7469381Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:12:29.7469775Z skip: Need at least 4 CUDA devices (4.036s) 2023-01-11T22:12:29.7470259Z test_sharded_linear_errors (__main__.TestShardedTensorOpsLinear) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51969 2023-01-11T22:12:29.7470815Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51970 2023-01-11T22:12:29.7471268Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 51971 2023-01-11T22:12:29.7471693Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 51972 2023-01-11T22:12:29.7472305Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:29.7472751Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:29.7473323Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:29.7473768Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:29.7474347Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:29.7474793Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:29.7475346Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:29.7475812Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:29.7476385Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:29.7476829Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:29.7477373Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:29.7477834Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:29.7478404Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:29.7478914Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:29.7479469Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:29.7479927Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:29.7480362Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:29.7480816Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:12:29.7481285Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:12:29.7481742Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:29.7482132Z skip: Need at least 4 CUDA devices (2.411s) 2023-01-11T22:12:29.7482613Z test_sharded_linear_rowwise (__main__.TestShardedTensorOpsLinear) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52105 2023-01-11T22:12:29.7483169Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52106 2023-01-11T22:12:29.7483673Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 52107 2023-01-11T22:12:29.7484107Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 52108 2023-01-11T22:12:29.7484718Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:29.7485170Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:29.7485743Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:29.7486191Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:29.7486765Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:29.7487212Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:29.7487783Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:29.7488228Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:29.7488801Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:29.7489242Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:29.7489789Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:29.7490250Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:29.7490824Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:29.7491266Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:29.7491812Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:29.7492277Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:29.7492714Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:29.7493170Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:12:29.7493636Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:29.7494090Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:12:29.7494485Z skip: Need at least 4 CUDA devices (2.410s) 2023-01-11T22:12:29.7494677Z 2023-01-11T22:12:29.7494932Z ---------------------------------------------------------------------- 2023-01-11T22:12:29.7495327Z Ran 3 tests in 8.858s 2023-01-11T22:12:29.7495490Z 2023-01-11T22:12:29.7495601Z OK (skipped=3) 2023-01-11T22:12:29.7495754Z 2023-01-11T22:12:29.7495862Z Generating XML reports... 2023-01-11T22:12:29.7496527Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_linear/TEST-TestShardedTensorOpsLinear-20230111221220.xml 2023-01-11T22:12:29.7497540Z 2023-01-11T22:12:29.7498087Z ##[endgroup] 2023-01-11T22:12:29.7499320Z FINISHED PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_linear (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_linear_o48tmlqt) 2023-01-11T22:12:29.7500055Z 2023-01-11T22:12:29.7500525Z Running distributed/fsdp/test_fsdp_pure_fp16 ... [2023-01-11 22:12:29.745080] 2023-01-11T22:12:29.7501220Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_pure_fp16.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:12:29.745339] 2023-01-11T22:12:42.4998447Z 2023-01-11T22:12:42.4999069Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_pure_fp16 2023-01-11T22:12:42.5000437Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_pure_fp16 (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_pure_fp16_5bieqnya) 2023-01-11T22:12:42.5000920Z 2023-01-11T22:12:42.5001041Z Running tests... 2023-01-11T22:12:42.5001642Z ---------------------------------------------------------------------- 2023-01-11T22:12:42.5002220Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_pure_fp16 2023-01-11T22:12:42.5002693Z test_pure_fp16_cpu_offload_CPUOffload(offload_params=False) (__main__.TestPureFP16) 2023-01-11T22:12:42.5003284Z Tests pure FP16 training, including when the parameter's dtype is ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:42.5003773Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52276 2023-01-11T22:12:42.5004232Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52277 2023-01-11T22:12:42.5005035Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:42.5005494Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:42.5006072Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:42.5006527Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:42.5007102Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:42.5007546Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:42.5008113Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:42.5008554Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:42.5009010Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:42.5009507Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:42.5010144Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:42.5010829Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:42.5011345Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:42.5011866Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:42.5013127Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:12:42.5014073Z warnings.warn( 2023-01-11T22:12:42.5015233Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:12:42.5016004Z warnings.warn( 2023-01-11T22:12:42.5016253Z dist init r=0, world=2 2023-01-11T22:12:42.5016485Z dist init r=1, world=2 2023-01-11T22:12:42.5017027Z ok (6.019s) 2023-01-11T22:12:42.5017383Z test_pure_fp16_cpu_offload_CPUOffload(offload_params=True) (__main__.TestPureFP16) 2023-01-11T22:12:42.5018149Z Tests pure FP16 training, including when the parameter's dtype is ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52359 2023-01-11T22:12:42.5018703Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52360 2023-01-11T22:12:42.5019312Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:42.5019760Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:42.5020311Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:42.5020777Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:42.5021348Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:42.5021797Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:42.5022351Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:42.5022814Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:42.5023267Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:42.5023746Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:42.5024402Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:42.5025088Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:42.5025603Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:42.5026054Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:42.5027320Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:12:42.5028096Z warnings.warn( 2023-01-11T22:12:42.5029237Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:12:42.5030098Z warnings.warn( 2023-01-11T22:12:42.5030351Z File "", line 1, in 2023-01-11T22:12:42.5030723Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:12:42.5031092Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:12:42.5031441Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:12:42.5031810Z return self._bootstrap(parent_sentinel) 2023-01-11T22:12:42.5032199Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:12:42.5032533Z self.run() 2023-01-11T22:12:42.5032845Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:12:42.5033212Z self._target(*self._args, **self._kwargs) 2023-01-11T22:12:42.5033728Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:12:42.5034100Z self.run_test(test_name, pipe) 2023-01-11T22:12:42.5034678Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:12:42.5035078Z getattr(self, test_name)() 2023-01-11T22:12:42.5035593Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:12:42.5035941Z fn() 2023-01-11T22:12:42.5036426Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:12:42.5036814Z test(self, **param_kwargs) 2023-01-11T22:12:42.5037305Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:12:42.5037699Z return func(*args, **kwargs) 2023-01-11T22:12:42.5038229Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_pure_fp16.py", line 47, in test_pure_fp16 2023-01-11T22:12:42.5038683Z self._test_fsdp_parity( 2023-01-11T22:12:42.5039216Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:12:42.5039633Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:12:42.5040185Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:12:42.5040558Z output = model(*input) 2023-01-11T22:12:42.5041031Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:12:42.5041410Z return forward_call(*args, **kwargs) 2023-01-11T22:12:42.5041929Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:12:42.5042382Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:12:42.5042941Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:12:42.5043328Z _lazy_init(state, module) 2023-01-11T22:12:42.5043818Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:12:42.5044247Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:12:42.5044832Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:12:42.5045242Z handle.init_flat_param_attributes() 2023-01-11T22:12:42.5045748Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:12:42.5046124Z return func(*args, **kwargs) 2023-01-11T22:12:42.5046652Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:12:42.5047102Z p_assert( 2023-01-11T22:12:42.5047571Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:12:42.5047946Z traceback.print_stack() 2023-01-11T22:12:42.5048216Z File "", line 1, in 2023-01-11T22:12:42.5048580Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:12:42.5048947Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:12:42.5049299Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:12:42.5049667Z return self._bootstrap(parent_sentinel) 2023-01-11T22:12:42.5050049Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:12:42.5050379Z self.run() 2023-01-11T22:12:42.5050692Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:12:42.5051054Z self._target(*self._args, **self._kwargs) 2023-01-11T22:12:42.5051569Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:12:42.5051935Z self.run_test(test_name, pipe) 2023-01-11T22:12:42.5052518Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:12:42.5052918Z getattr(self, test_name)() 2023-01-11T22:12:42.5053428Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:12:42.5053774Z fn() 2023-01-11T22:12:42.5054261Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:12:42.5054649Z test(self, **param_kwargs) 2023-01-11T22:12:42.5055139Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:12:42.5055527Z return func(*args, **kwargs) 2023-01-11T22:12:42.5055923Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_pure_fp16.py", line 47, in test_pure_fp16 2023-01-11T22:12:42.5056276Z self._test_fsdp_parity( 2023-01-11T22:12:42.5057008Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:12:42.5057430Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:12:42.5057984Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:12:42.5058357Z output = model(*input) 2023-01-11T22:12:42.5058831Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:12:42.5059214Z return forward_call(*args, **kwargs) 2023-01-11T22:12:42.5059731Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:12:42.5060181Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:12:42.5060744Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:12:42.5061131Z _lazy_init(state, module) 2023-01-11T22:12:42.5061617Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:12:42.5062046Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:12:42.5062631Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:12:42.5063040Z handle.init_flat_param_attributes() 2023-01-11T22:12:42.5063545Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:12:42.5063923Z return func(*args, **kwargs) 2023-01-11T22:12:42.5064479Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:12:42.5064955Z p_assert( 2023-01-11T22:12:42.5065425Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:12:42.5065807Z traceback.print_stack() 2023-01-11T22:12:42.5066055Z dist init r=1, world=2 2023-01-11T22:12:42.5066526Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:12:42.5066969Z dist init r=0, world=2 2023-01-11T22:12:42.5067433Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:12:42.5067841Z ok (4.411s) 2023-01-11T22:12:42.5067989Z 2023-01-11T22:12:42.5068261Z ---------------------------------------------------------------------- 2023-01-11T22:12:42.5068592Z Ran 2 tests in 10.430s 2023-01-11T22:12:42.5068752Z 2023-01-11T22:12:42.5068833Z OK 2023-01-11T22:12:42.5068965Z 2023-01-11T22:12:42.5069088Z Generating XML reports... 2023-01-11T22:12:42.5069755Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_pure_fp16/TEST-TestPureFP16-20230111221231.xml 2023-01-11T22:12:42.5070107Z 2023-01-11T22:12:42.5070420Z ##[endgroup] 2023-01-11T22:12:42.5071032Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_pure_fp16 (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_pure_fp16_5bieqnya) 2023-01-11T22:12:42.5071383Z 2023-01-11T22:12:42.5071703Z Running distributed/_shard/sharded_tensor/ops/test_elementwise_ops ... [2023-01-11 22:12:42.499944] 2023-01-11T22:12:42.5072451Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_shard/sharded_tensor/ops/test_elementwise_ops.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:12:42.500202] 2023-01-11T22:12:55.7631345Z 2023-01-11T22:12:55.7632139Z Expand the folded group to see the log file of distributed/_shard/sharded_tensor/ops/test_elementwise_ops 2023-01-11T22:12:55.7633223Z ##[group]PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_elementwise_ops (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_elementwise_ops_y69knemy) 2023-01-11T22:12:55.7633640Z 2023-01-11T22:12:55.7633757Z Running tests... 2023-01-11T22:12:55.7634299Z ---------------------------------------------------------------------- 2023-01-11T22:12:55.7634914Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_elementwise_ops 2023-01-11T22:12:55.7635493Z test_sharded_dropout (__main__.TestShardedTensorElementWiseOps) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:55.7635981Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52477 2023-01-11T22:12:55.7636436Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52478 2023-01-11T22:12:55.7636888Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 52479 2023-01-11T22:12:55.7637310Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 52480 2023-01-11T22:12:55.7637942Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:55.7638397Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:55.7638975Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:55.7639430Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:55.7640007Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:55.7640451Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:55.7641027Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:55.7641724Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:55.7642309Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:55.7642750Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:55.7643296Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:55.7643758Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:55.7644329Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:55.7644769Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:55.7645319Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:55.7645781Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:55.7646222Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:12:55.7646783Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:55.7647265Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:55.7647720Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:12:55.7648113Z skip: Need at least 4 CUDA devices (4.007s) 2023-01-11T22:12:55.7648598Z test_sharded_gelu (__main__.TestShardedTensorElementWiseOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52613 2023-01-11T22:12:55.7649145Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52614 2023-01-11T22:12:55.7649593Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 52615 2023-01-11T22:12:55.7650040Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 52616 2023-01-11T22:12:55.7650636Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:55.7651089Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:55.7651661Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:55.7652110Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:55.7652685Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:55.7653128Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:55.7653695Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:55.7654198Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:55.7654757Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:55.7655206Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:55.7655775Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:55.7656242Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:55.7657082Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:55.7657532Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:55.7658105Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:55.7658551Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:55.7659107Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:55.7659579Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:12:55.7660053Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:55.7660500Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:12:55.7660889Z skip: Need at least 4 CUDA devices (2.309s) 2023-01-11T22:12:55.7661389Z test_sharded_relu (__main__.TestShardedTensorElementWiseOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52749 2023-01-11T22:12:55.7661917Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52750 2023-01-11T22:12:55.7662361Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 52751 2023-01-11T22:12:55.7662797Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 52752 2023-01-11T22:12:55.7663416Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:55.7663930Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:55.7664519Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:55.7664984Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:55.7665542Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:55.7665983Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:55.7666550Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:55.7667012Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:55.7667571Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:55.7668020Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:55.7668590Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:55.7669050Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:55.7669602Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:55.7670046Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:55.7670611Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:55.7671057Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:55.7671494Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:55.7671971Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:12:55.7672443Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:55.7672891Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:12:55.7673277Z skip: Need at least 4 CUDA devices (2.410s) 2023-01-11T22:12:55.7673793Z test_sharded_tensor_nan_to_num (__main__.TestShardedTensorElementWiseOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52885 2023-01-11T22:12:55.7674334Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52886 2023-01-11T22:12:55.7674775Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 52887 2023-01-11T22:12:55.7675210Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 52888 2023-01-11T22:12:55.7675887Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:55.7676316Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:55.7676879Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:55.7677325Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:55.7677897Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:55.7678342Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:55.7678923Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:55.7679384Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:55.7679939Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:55.7680392Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:55.7681012Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:55.7681483Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:55.7682041Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:12:55.7682482Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:55.7683049Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:55.7683490Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:55.7683927Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:12:55.7684408Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:12:55.7684878Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:55.7685321Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:55.7685708Z skip: Need at least 4 CUDA devices (2.309s) 2023-01-11T22:12:55.7685903Z 2023-01-11T22:12:55.7686178Z ---------------------------------------------------------------------- 2023-01-11T22:12:55.7686490Z Ran 4 tests in 11.036s 2023-01-11T22:12:55.7686653Z 2023-01-11T22:12:55.7686762Z OK (skipped=4) 2023-01-11T22:12:55.7686917Z 2023-01-11T22:12:55.7687044Z Generating XML reports... 2023-01-11T22:12:55.7687746Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_elementwise_ops/TEST-TestShardedTensorElementWiseOps-20230111221244.xml 2023-01-11T22:12:55.7688173Z 2023-01-11T22:12:55.7688478Z ##[endgroup] 2023-01-11T22:12:55.7689187Z FINISHED PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_elementwise_ops (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_elementwise_ops_y69knemy) 2023-01-11T22:12:55.7689604Z 2023-01-11T22:12:55.7689910Z Running distributed/_shard/sharding_plan/test_sharding_plan ... [2023-01-11 22:12:55.763222] 2023-01-11T22:12:55.7690642Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_shard/sharding_plan/test_sharding_plan.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:12:55.763476] 2023-01-11T22:13:11.6703635Z 2023-01-11T22:13:11.6704438Z Expand the folded group to see the log file of distributed/_shard/sharding_plan/test_sharding_plan 2023-01-11T22:13:11.6705829Z ##[group]PRINTING LOG FILE of distributed/_shard/sharding_plan/test_sharding_plan (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharding_plan-test_sharding_plan_pxx9u7vl) 2023-01-11T22:13:11.6706677Z 2023-01-11T22:13:11.6706795Z Running tests... 2023-01-11T22:13:11.6707335Z ---------------------------------------------------------------------- 2023-01-11T22:13:11.6707937Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharding_plan.test_sharding_plan 2023-01-11T22:13:11.6708455Z test_custom_sharding_planner (__main__.TestShardingPlan) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:13:11.6708942Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53056 2023-01-11T22:13:11.6709394Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53057 2023-01-11T22:13:11.6709966Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 53058 2023-01-11T22:13:11.6710783Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 53059 2023-01-11T22:13:11.6712060Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:11.6712775Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:11.6713394Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:11.6713989Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:11.6714589Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:11.6715034Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:11.6715587Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:11.6716049Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:11.6716619Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:11.6717056Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:11.6717610Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:11.6718139Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:11.6718713Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:11.6719154Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:11.6719698Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:11.6720167Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:11.6720601Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:13:11.6721074Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:13:11.6721522Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:13:11.6721974Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:13:11.6722370Z skip: Need at least 4 CUDA devices (4.054s) 2023-01-11T22:13:11.6722834Z test_reshard_to_ddp_sharding_plan (__main__.TestShardingPlan) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53192 2023-01-11T22:13:11.6723362Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53193 2023-01-11T22:13:11.6723805Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 53194 2023-01-11T22:13:11.6724243Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 53195 2023-01-11T22:13:11.6724835Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:11.6725284Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:11.6725932Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:11.6726400Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:11.6726954Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:11.6727395Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:11.6727958Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:11.6728400Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:11.6728968Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:11.6729409Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:11.6729975Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:11.6730412Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:11.6731039Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:11.6731488Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:11.6732034Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:11.6732493Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:11.6732926Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:13:11.6733398Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:13:11.6733848Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:13:11.6734307Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:13:11.6734700Z skip: Need at least 4 CUDA devices (2.412s) 2023-01-11T22:13:11.6735181Z test_shard_module_sub_process_group (__main__.TestShardingPlan) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53328 2023-01-11T22:13:11.6735696Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53329 2023-01-11T22:13:11.6736141Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 53330 2023-01-11T22:13:11.6736853Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 53331 2023-01-11T22:13:11.6737469Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:11.6737915Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:11.6738488Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:11.6738953Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:11.6739512Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:11.6739953Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:11.6740518Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:11.6740960Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:11.6741530Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:11.6741971Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:11.6742535Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:11.6743092Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:11.6743673Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:11.6744110Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:11.6744674Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:11.6745111Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:11.6745542Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:13:11.6746007Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:13:11.6746450Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:13:11.6746909Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:13:11.6747298Z skip: Need at least 4 CUDA devices (2.410s) 2023-01-11T22:13:11.6747837Z test_sharding_plan_errors (__main__.TestShardingPlan) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53464 2023-01-11T22:13:11.6748351Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53465 2023-01-11T22:13:11.6748791Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 53466 2023-01-11T22:13:11.6749225Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 53467 2023-01-11T22:13:11.6749813Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:11.6750259Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:11.6750823Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:11.6751289Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:11.6751847Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:11.6752289Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:11.6752850Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:11.6753310Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:11.6753868Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:11.6754304Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:11.6754865Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:11.6755312Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:11.6755884Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:11.6756324Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:11.6756886Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:11.6757323Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:11.6757754Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:13:11.6758221Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:13:11.6758665Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:13:11.6759126Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:13:11.6759575Z skip: Need at least 4 CUDA devices (2.311s) 2023-01-11T22:13:11.6760058Z test_sharding_plan_simple_megatron (__main__.TestShardingPlan) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53600 2023-01-11T22:13:11.6760576Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53601 2023-01-11T22:13:11.6761018Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 53602 2023-01-11T22:13:11.6761458Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 53603 2023-01-11T22:13:11.6762045Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:11.6762492Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:11.6763059Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:11.6763525Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:11.6764077Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:11.6764585Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:11.6765164Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:11.6765623Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:11.6766173Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:11.6766611Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:11.6767172Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:11.6767611Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:11.6768183Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:11.6768628Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:11.6769193Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:11.6769629Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:11.6770059Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:13:11.6770529Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:13:11.6770969Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:13:11.6771433Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:13:11.6771818Z skip: Need at least 4 CUDA devices (2.412s) 2023-01-11T22:13:11.6772012Z 2023-01-11T22:13:11.6772286Z ---------------------------------------------------------------------- 2023-01-11T22:13:11.6772599Z Ran 5 tests in 13.599s 2023-01-11T22:13:11.6772761Z 2023-01-11T22:13:11.6772874Z OK (skipped=5) 2023-01-11T22:13:11.6773027Z 2023-01-11T22:13:11.6773150Z Generating XML reports... 2023-01-11T22:13:11.6773752Z Generated XML report: test-reports/python-unittest/distributed._shard.sharding_plan.test_sharding_plan/TEST-TestShardingPlan-20230111221257.xml 2023-01-11T22:13:11.6774120Z 2023-01-11T22:13:11.6774447Z ##[endgroup] 2023-01-11T22:13:11.6775116Z FINISHED PRINTING LOG FILE of distributed/_shard/sharding_plan/test_sharding_plan (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharding_plan-test_sharding_plan_pxx9u7vl) 2023-01-11T22:13:11.6775506Z 2023-01-11T22:13:11.6775762Z Running distributed/_tensor/test_api ... [2023-01-11 22:13:11.670504] 2023-01-11T22:13:11.6776466Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_tensor/test_api.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:13:11.670762] 2023-01-11T22:13:27.4425565Z 2023-01-11T22:13:27.4426099Z Expand the folded group to see the log file of distributed/_tensor/test_api 2023-01-11T22:13:27.4426998Z ##[group]PRINTING LOG FILE of distributed/_tensor/test_api (/var/lib/jenkins/workspace/test/test-reports/distributed-_tensor-test_api_qp88b7b4) 2023-01-11T22:13:27.4427362Z 2023-01-11T22:13:27.4427477Z Running tests... 2023-01-11T22:13:27.4427989Z ---------------------------------------------------------------------- 2023-01-11T22:13:27.4428515Z Test results will be stored in test-reports/python-unittest/distributed._tensor.test_api 2023-01-11T22:13:27.4429009Z test_distribute_module (__main__.DTensorAPITest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:13:27.4429514Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53771 2023-01-11T22:13:27.4430036Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53772 2023-01-11T22:13:27.4430484Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 53773 2023-01-11T22:13:27.4431141Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 53774 2023-01-11T22:13:27.4431801Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:27.4432263Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:27.4432835Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:27.4433288Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:27.4433866Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:27.4434313Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:27.4434852Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:27.4435299Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:27.4435873Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:27.4436337Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:27.4436899Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:27.4437365Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:27.4437941Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:27.4438365Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:27.4438934Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:27.4439399Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:27.4439838Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:13:27.4440294Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:13:27.4440752Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:13:27.4441213Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:13:27.4441605Z skip: Need at least 4 CUDA devices (3.997s) 2023-01-11T22:13:27.4442075Z test_distribute_module_input_fn_output_fn (__main__.DTensorAPITest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53907 2023-01-11T22:13:27.4442611Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53908 2023-01-11T22:13:27.4443185Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 53909 2023-01-11T22:13:27.4443616Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 53910 2023-01-11T22:13:27.4444225Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:27.4444675Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:27.4445248Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:27.4445697Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:27.4446270Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:27.4446711Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:27.4447261Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:27.4447733Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:27.4448376Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:27.4448826Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:27.4449378Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:27.4449840Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:27.4450409Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:27.4450849Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:27.4451393Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:27.4451861Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:27.4452297Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:13:27.4452754Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:13:27.4453210Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:13:27.4453672Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:13:27.4454061Z skip: Need at least 4 CUDA devices (2.310s) 2023-01-11T22:13:27.4454505Z test_distribute_tensor (__main__.DTensorAPITest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54043 2023-01-11T22:13:27.4455014Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54044 2023-01-11T22:13:27.4455464Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 54045 2023-01-11T22:13:27.4455889Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 54046 2023-01-11T22:13:27.4456500Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:27.4457257Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:27.4457848Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:27.4458296Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:27.4458872Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:27.4459316Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:27.4459865Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:27.4460460Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:27.4461045Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:27.4461486Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:27.4462034Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:27.4462493Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:27.4463062Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:27.4463501Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:27.4464045Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:27.4464513Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:27.4464949Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:13:27.4465480Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:13:27.4465963Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:13:27.4466420Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:13:27.4466810Z skip: Need at least 4 CUDA devices (2.309s) 2023-01-11T22:13:27.4467262Z test_distribute_tensor_errors (__main__.DTensorAPITest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54179 2023-01-11T22:13:27.4467785Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54180 2023-01-11T22:13:27.4468233Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 54181 2023-01-11T22:13:27.4468659Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 54182 2023-01-11T22:13:27.4469278Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:27.4469724Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:27.4470291Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:27.4470742Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:27.4471319Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:27.4471760Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:27.4472324Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:27.4472771Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:27.4473343Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:27.4473786Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:27.4474334Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:27.4474794Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:27.4475366Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:27.4475837Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:27.4476386Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:27.4476841Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:27.4477354Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:13:27.4477829Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:13:27.4478274Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:13:27.4478731Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:13:27.4479121Z skip: Need at least 4 CUDA devices (2.410s) 2023-01-11T22:13:27.4479586Z test_distribute_tensor_uneven_sharding (__main__.DTensorAPITest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54315 2023-01-11T22:13:27.4480118Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54316 2023-01-11T22:13:27.4480569Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 54317 2023-01-11T22:13:27.4481010Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 54318 2023-01-11T22:13:27.4481608Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:27.4482107Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:27.4482695Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:27.4483164Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:27.4483717Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:27.4484154Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:27.4484718Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:27.4485163Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:27.4485737Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:27.4486177Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:27.4486743Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:27.4487180Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:27.4487750Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:27.4488186Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:27.4488733Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:27.4489190Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:27.4489625Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:13:27.4490100Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:13:27.4490550Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:13:27.4491008Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:13:27.4491396Z skip: Need at least 4 CUDA devices (2.410s) 2023-01-11T22:13:27.4491589Z 2023-01-11T22:13:27.4491864Z ---------------------------------------------------------------------- 2023-01-11T22:13:27.4492173Z Ran 5 tests in 13.438s 2023-01-11T22:13:27.4492336Z 2023-01-11T22:13:27.4492445Z OK (skipped=5) 2023-01-11T22:13:27.4492598Z 2023-01-11T22:13:27.4492723Z Generating XML reports... 2023-01-11T22:13:27.4493269Z Generated XML report: test-reports/python-unittest/distributed._tensor.test_api/TEST-DTensorAPITest-20230111221313.xml 2023-01-11T22:13:27.4493669Z 2023-01-11T22:13:27.4493993Z ##[endgroup] 2023-01-11T22:13:27.4494571Z FINISHED PRINTING LOG FILE of distributed/_tensor/test_api (/var/lib/jenkins/workspace/test/test-reports/distributed-_tensor-test_api_qp88b7b4) 2023-01-11T22:13:27.4494908Z 2023-01-11T22:13:27.4495172Z Running distributed/_composable/test_replicate ... [2023-01-11 22:13:27.442620] 2023-01-11T22:13:27.4495873Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_composable/test_replicate.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:13:27.442875] 2023-01-11T22:13:43.6883341Z 2023-01-11T22:13:43.6884073Z Expand the folded group to see the log file of distributed/_composable/test_replicate 2023-01-11T22:13:43.6885039Z ##[group]PRINTING LOG FILE of distributed/_composable/test_replicate (/var/lib/jenkins/workspace/test/test-reports/distributed-_composable-test_replicate_1hpoka8r) 2023-01-11T22:13:43.6885417Z 2023-01-11T22:13:43.6885562Z Running tests... 2023-01-11T22:13:43.6886063Z ---------------------------------------------------------------------- 2023-01-11T22:13:43.6886663Z Test results will be stored in test-reports/python-unittest/distributed._composable.test_replicate 2023-01-11T22:13:43.6887409Z test_replicate_non_root_multiple_save_load (__main__.ReplicateStateDictTest) 2023-01-11T22:13:43.6887895Z Tests tha replicate() on multiple submodules matches ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:13:43.6888379Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54486 2023-01-11T22:13:43.6888841Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54487 2023-01-11T22:13:43.6889294Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 54488 2023-01-11T22:13:43.6889711Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 54489 2023-01-11T22:13:43.6890342Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:43.6890808Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:43.6891364Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:43.6891843Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:43.6892426Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:43.6892877Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:43.6893425Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:43.6893893Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:43.6894465Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:43.6894909Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:43.6895463Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:43.6895934Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:43.6896510Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:43.6897283Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:43.6897872Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:43.6898339Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:43.6898780Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:13:43.6899234Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:13:43.6899834Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:13:43.6900302Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:13:43.6900722Z ok (4.023s) 2023-01-11T22:13:43.6901070Z test_replicate_single_module_save_load (__main__.ReplicateStateDictTest) 2023-01-11T22:13:43.6901576Z Tests that replicate() on a single module state_dict ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54622 2023-01-11T22:13:43.6902098Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54623 2023-01-11T22:13:43.6902545Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 54624 2023-01-11T22:13:43.6902985Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 54625 2023-01-11T22:13:43.6903582Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:43.6904039Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:43.6904611Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:43.6905135Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:43.6905735Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:43.6906184Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:43.6906757Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:43.6907202Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:43.6907775Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:43.6908213Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:43.6908786Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:43.6909232Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:43.6909806Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:43.6910245Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:43.6910790Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:43.6911252Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:43.6911689Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:13:43.6912159Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:13:43.6912610Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:13:43.6913065Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:13:43.6913409Z ok (2.410s) 2023-01-11T22:13:43.6913816Z test_replicate_multi_module (__main__.ReplicateTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54758 2023-01-11T22:13:43.6914391Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54759 2023-01-11T22:13:43.6914845Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 54760 2023-01-11T22:13:43.6915292Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 54761 2023-01-11T22:13:43.6915885Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:43.6916327Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:43.6916982Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:43.6917431Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:43.6918014Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:43.6918458Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:43.6919026Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:43.6919470Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:43.6920040Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:43.6920481Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:43.6921049Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:43.6921493Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:43.6922125Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:43.6922579Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:43.6923132Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:43.6923590Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:43.6924026Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:13:43.6924499Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:13:43.6924946Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:13:43.6925409Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:13:43.6925894Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:13:43.6926369Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:13:43.6926854Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:13:43.6927337Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:13:43.6927994Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:43.6928664Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:43.6929354Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:43.6930039Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:43.6930558Z INFO:torch.distributed._composable._ddp:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:13:43.6931015Z INFO:torch.distributed._composable._ddp:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:13:43.6931491Z INFO:torch.distributed._composable._ddp:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:13:43.6931955Z INFO:torch.distributed._composable._ddp:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:13:43.6932413Z INFO:torch.distributed._composable._ddp:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:13:43.6932854Z INFO:torch.distributed._composable._ddp:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:13:43.6933320Z INFO:torch.distributed._composable._ddp:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:13:43.6933860Z INFO:torch.distributed._composable._ddp:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:13:43.6934313Z INFO:torch.distributed._composable._ddp:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:13:43.6934785Z INFO:torch.distributed._composable._ddp:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:13:43.6935244Z INFO:torch.distributed._composable._ddp:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:13:43.6935701Z INFO:torch.distributed._composable._ddp:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:13:43.6936019Z ok (2.510s) 2023-01-11T22:13:43.6936454Z test_replicate_single_module (__main__.ReplicateTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54914 2023-01-11T22:13:43.6937185Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54915 2023-01-11T22:13:43.6937617Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 54916 2023-01-11T22:13:43.6938061Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 54917 2023-01-11T22:13:43.6938768Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:43.6939237Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:43.6939798Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:43.6940267Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:43.6940848Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:43.6941275Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:43.6941848Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:43.6942317Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:43.6942900Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:43.6943323Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:43.6943891Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:43.6944349Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:43.6944919Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:43.6945339Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:43.6945905Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:43.6946367Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:43.6946781Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:13:43.6947254Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:13:43.6947721Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:13:43.6948184Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:13:43.6948650Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:13:43.6949136Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:13:43.6949615Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:13:43.6950080Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:13:43.6950850Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:43.6951545Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:43.6952221Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:43.6952877Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:43.6953392Z INFO:torch.distributed._composable._ddp:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:13:43.6953866Z INFO:torch.distributed._composable._ddp:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:13:43.6954334Z INFO:torch.distributed._composable._ddp:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:13:43.6954786Z INFO:torch.distributed._composable._ddp:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:13:43.6955130Z ok (2.509s) 2023-01-11T22:13:43.6955616Z test_replicate_with_kwargs (__main__.ReplicateTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55070 2023-01-11T22:13:43.6956129Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55071 2023-01-11T22:13:43.6956575Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 55072 2023-01-11T22:13:43.6957008Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 55073 2023-01-11T22:13:43.6957619Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:43.6958050Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:43.6958623Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:43.6959091Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:43.6959668Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:43.6960089Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:43.6960654Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:43.6961115Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:43.6961666Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:43.6962102Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:43.6962659Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:13:43.6963110Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:13:43.6963660Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:43.6964125Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:43.6964711Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:13:43.6965149Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:13:43.6965587Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:13:43.6966056Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:13:43.6966519Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:13:43.6966963Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:13:43.6967520Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:13:43.6968013Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:13:43.6968502Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:13:43.6968968Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:13:43.6969618Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:43.6970304Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:43.6970966Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:43.6971647Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:43.6972222Z INFO:torch.distributed._composable._ddp:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:13:43.6972711Z INFO:torch.distributed._composable._ddp:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:13:43.6973161Z INFO:torch.distributed._composable._ddp:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:13:43.6973640Z INFO:torch.distributed._composable._ddp:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:13:43.6973978Z ok (2.509s) 2023-01-11T22:13:43.6974126Z 2023-01-11T22:13:43.6974401Z ---------------------------------------------------------------------- 2023-01-11T22:13:43.6974712Z Ran 5 tests in 13.962s 2023-01-11T22:13:43.6974878Z 2023-01-11T22:13:43.6974974Z OK 2023-01-11T22:13:43.6975109Z 2023-01-11T22:13:43.6975235Z Generating XML reports... 2023-01-11T22:13:43.6975847Z Generated XML report: test-reports/python-unittest/distributed._composable.test_replicate/TEST-ReplicateStateDictTest-20230111221329.xml 2023-01-11T22:13:43.6976860Z Generated XML report: test-reports/python-unittest/distributed._composable.test_replicate/TEST-ReplicateTest-20230111221329.xml 2023-01-11T22:13:43.6977216Z 2023-01-11T22:13:43.6977546Z ##[endgroup] 2023-01-11T22:13:43.6978150Z FINISHED PRINTING LOG FILE of distributed/_composable/test_replicate (/var/lib/jenkins/workspace/test/test-reports/distributed-_composable-test_replicate_1hpoka8r) 2023-01-11T22:13:43.6978520Z 2023-01-11T22:13:43.6978822Z Running distributed/tensor/parallel/test_parallelize_api ... [2023-01-11 22:13:43.688480] 2023-01-11T22:13:43.6979549Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/tensor/parallel/test_parallelize_api.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:13:43.688748] 2023-01-11T22:14:01.9744962Z 2023-01-11T22:14:01.9745714Z Expand the folded group to see the log file of distributed/tensor/parallel/test_parallelize_api 2023-01-11T22:14:01.9746772Z ##[group]PRINTING LOG FILE of distributed/tensor/parallel/test_parallelize_api (/var/lib/jenkins/workspace/test/test-reports/distributed-tensor-parallel-test_parallelize_api_hgtfvv2k) 2023-01-11T22:14:01.9747201Z 2023-01-11T22:14:01.9747354Z Running tests... 2023-01-11T22:14:01.9747847Z ---------------------------------------------------------------------- 2023-01-11T22:14:01.9748577Z Test results will be stored in test-reports/python-unittest/distributed.tensor.parallel.test_parallelize_api 2023-01-11T22:14:01.9749379Z test_creat_1d_device_mesh (__main__.TensorParallelAPITests) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:14:01.9749855Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55261 2023-01-11T22:14:01.9750312Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55262 2023-01-11T22:14:01.9751054Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 55263 2023-01-11T22:14:01.9751813Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 55264 2023-01-11T22:14:01.9752440Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:01.9752894Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:01.9753469Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:01.9753919Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:01.9754499Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:01.9754941Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:01.9755509Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:01.9755963Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:01.9756533Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:01.9757068Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:01.9757654Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:01.9758098Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:01.9758667Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:01.9759102Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:01.9759644Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:01.9760103Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:01.9760540Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:01.9761014Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:01.9761460Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:14:01.9761915Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:14:01.9762306Z skip: Need at least 4 CUDA devices (4.018s) 2023-01-11T22:14:01.9762779Z test_creat_1d_device_mesh_error (__main__.TensorParallelAPITests) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55397 2023-01-11T22:14:01.9763322Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55398 2023-01-11T22:14:01.9763766Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 55399 2023-01-11T22:14:01.9764203Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 55400 2023-01-11T22:14:01.9764797Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:01.9765248Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:01.9765805Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:01.9766249Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:01.9766800Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:01.9767264Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:01.9767839Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:01.9768279Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:01.9768923Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:01.9769371Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:01.9769933Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:01.9770373Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:01.9770941Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:01.9771378Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:01.9771921Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:01.9772378Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:01.9772817Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:01.9773284Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:01.9773786Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:14:01.9774249Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:14:01.9774637Z skip: Need at least 4 CUDA devices (2.409s) 2023-01-11T22:14:01.9775110Z test_linear_col_wise_parallel (__main__.TensorParallelAPITests) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55533 2023-01-11T22:14:01.9775645Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55534 2023-01-11T22:14:01.9776089Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 55535 2023-01-11T22:14:01.9776525Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 55536 2023-01-11T22:14:01.9777523Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:01.9777975Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:01.9778602Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:01.9779065Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:01.9779635Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:01.9780060Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:01.9780627Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:01.9781088Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:01.9781643Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:01.9782080Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:01.9782645Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:01.9783102Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:01.9783650Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:01.9784089Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:01.9784646Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:01.9785100Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:01.9785514Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:01.9786098Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:01.9786565Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:14:01.9787009Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:14:01.9787391Z skip: Need at least 4 CUDA devices (2.410s) 2023-01-11T22:14:01.9787879Z test_linear_row_wise_parallel (__main__.TensorParallelAPITests) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55669 2023-01-11T22:14:01.9788416Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55670 2023-01-11T22:14:01.9788843Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 55671 2023-01-11T22:14:01.9789276Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 55672 2023-01-11T22:14:01.9789889Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:01.9790321Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:01.9790955Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:01.9791433Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:01.9792008Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:01.9792430Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:01.9792991Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:01.9793447Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:01.9794014Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:01.9794441Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:01.9795005Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:01.9795458Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:01.9796008Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:01.9796449Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:01.9797016Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:01.9797477Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:01.9797892Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:14:01.9798363Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:01.9798826Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:14:01.9799270Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:01.9799663Z skip: Need at least 4 CUDA devices (2.410s) 2023-01-11T22:14:01.9800145Z test_parallelize_mlp (__main__.TensorParallelAPITests) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55805 2023-01-11T22:14:01.9800673Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55806 2023-01-11T22:14:01.9801101Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 55807 2023-01-11T22:14:01.9801533Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 55808 2023-01-11T22:14:01.9802136Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:01.9802632Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:01.9803206Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:01.9803670Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:01.9804242Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:01.9804662Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:01.9805224Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:01.9805683Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:01.9806250Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:01.9806669Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:01.9807237Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:01.9807746Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:01.9808310Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:01.9808746Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:01.9809307Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:01.9809766Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:01.9810176Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:14:01.9810642Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:14:01.9811111Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:01.9811555Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:01.9811945Z skip: Need at least 4 CUDA devices (2.310s) 2023-01-11T22:14:01.9812437Z test_parallelize_mlp_error (__main__.TensorParallelAPITests) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55941 2023-01-11T22:14:01.9812973Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55942 2023-01-11T22:14:01.9813398Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 55943 2023-01-11T22:14:01.9813829Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 55944 2023-01-11T22:14:01.9814430Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:01.9814930Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:01.9815487Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:01.9815954Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:01.9816526Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:01.9817419Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:01.9818003Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:01.9818466Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:01.9819035Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:01.9819459Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:01.9820132Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:01.9820579Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:01.9821127Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:01.9821589Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:01.9822166Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:01.9822624Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:01.9823039Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:14:01.9823502Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:01.9823967Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:14:01.9824406Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:01.9824868Z skip: Need at least 4 CUDA devices (2.409s) 2023-01-11T22:14:01.9825075Z 2023-01-11T22:14:01.9825349Z ---------------------------------------------------------------------- 2023-01-11T22:14:01.9825675Z Ran 6 tests in 15.966s 2023-01-11T22:14:01.9825836Z 2023-01-11T22:14:01.9825928Z OK (skipped=6) 2023-01-11T22:14:01.9826081Z 2023-01-11T22:14:01.9826206Z Generating XML reports... 2023-01-11T22:14:01.9826865Z Generated XML report: test-reports/python-unittest/distributed.tensor.parallel.test_parallelize_api/TEST-TensorParallelAPITests-20230111221345.xml 2023-01-11T22:14:01.9827267Z 2023-01-11T22:14:01.9827580Z ##[endgroup] 2023-01-11T22:14:01.9828252Z FINISHED PRINTING LOG FILE of distributed/tensor/parallel/test_parallelize_api (/var/lib/jenkins/workspace/test/test-reports/distributed-tensor-parallel-test_parallelize_api_hgtfvv2k) 2023-01-11T22:14:01.9828669Z 2023-01-11T22:14:01.9828957Z Running distributed/fsdp/test_fsdp_tp_integration ... [2023-01-11 22:14:01.974620] 2023-01-11T22:14:01.9829662Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_tp_integration.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:14:01.974871] 2023-01-11T22:14:25.6697141Z 2023-01-11T22:14:25.6697866Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_tp_integration 2023-01-11T22:14:25.6700569Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_tp_integration (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_tp_integration_esizilor) 2023-01-11T22:14:25.6701016Z 2023-01-11T22:14:25.6701248Z Running tests... 2023-01-11T22:14:25.6701895Z ---------------------------------------------------------------------- 2023-01-11T22:14:25.6702855Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_tp_integration 2023-01-11T22:14:25.6703431Z test_fsdp_tp_checkpoint_integration (__main__.TestTPFSDPIntegration) 2023-01-11T22:14:25.6704063Z Tests checkpointing for TP + FSDP integration. ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:14:25.6704596Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56112 2023-01-11T22:14:25.6705060Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56113 2023-01-11T22:14:25.6705729Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:25.6706167Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:25.6706747Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:25.6707396Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:25.6708073Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:25.6708754Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:25.6709344Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:25.6710019Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:25.6710926Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:14:25.6711997Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:14:25.6712946Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:14:25.6713917Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:14:25.6714425Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:25.6714903Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:25.6715376Z dist init r=1, world=2 2023-01-11T22:14:25.6715714Z dist init r=0, world=2 2023-01-11T22:14:25.6715985Z skip: Need at least 4 CUDA devices (5.517s) 2023-01-11T22:14:25.6716453Z test_fsdp_tp_integration_tensor_parallel_size_2_cpu_offload_CPUOffload(offload_params=False) (__main__.TestTPFSDPIntegration) 2023-01-11T22:14:25.6717207Z Tests training for TP + FSDP integration by comparing an FSDP-only ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56191 2023-01-11T22:14:25.6717733Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56192 2023-01-11T22:14:25.6718347Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:25.6718801Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:25.6719422Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:25.6719897Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:25.6720456Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:25.6720907Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:25.6721478Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:25.6721956Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:25.6722394Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:14:25.6722890Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:14:25.6723551Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:14:25.6724227Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:14:25.6724753Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:25.6725225Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:25.6725581Z dist init r=0, world=2 2023-01-11T22:14:25.6725815Z dist init r=1, world=2 2023-01-11T22:14:25.6726106Z skip: Need at least 4 CUDA devices (3.912s) 2023-01-11T22:14:25.6726567Z test_fsdp_tp_integration_tensor_parallel_size_2_cpu_offload_CPUOffload(offload_params=True) (__main__.TestTPFSDPIntegration) 2023-01-11T22:14:25.6727296Z Tests training for TP + FSDP integration by comparing an FSDP-only ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56270 2023-01-11T22:14:25.6727920Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56271 2023-01-11T22:14:25.6728536Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:25.6728985Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:25.6729538Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:25.6730005Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:25.6730580Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:25.6731027Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:25.6731576Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:25.6732041Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:25.6732551Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:14:25.6733042Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:14:25.6733698Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:14:25.6734378Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:14:25.6734896Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:25.6735350Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:25.6735699Z dist init r=1, world=2 2023-01-11T22:14:25.6735953Z dist init r=0, world=2 2023-01-11T22:14:25.6736231Z skip: Need at least 4 CUDA devices (4.012s) 2023-01-11T22:14:25.6737050Z test_fsdp_tp_integration_tensor_parallel_size_4_cpu_offload_CPUOffload(offload_params=False) (__main__.TestTPFSDPIntegration) 2023-01-11T22:14:25.6737817Z Tests training for TP + FSDP integration by comparing an FSDP-only ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56349 2023-01-11T22:14:25.6738356Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56350 2023-01-11T22:14:25.6738945Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:25.6739395Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:25.6739969Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:25.6740436Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:25.6740998Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:25.6741442Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:25.6742018Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:25.6742463Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:25.6742922Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:14:25.6743418Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:14:25.6744073Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:14:25.6744738Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:14:25.6745374Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:25.6745854Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:25.6746214Z dist init r=0, world=2 2023-01-11T22:14:25.6746449Z dist init r=1, world=2 2023-01-11T22:14:25.6746737Z skip: Need at least 4 CUDA devices (4.012s) 2023-01-11T22:14:25.6747195Z test_fsdp_tp_integration_tensor_parallel_size_4_cpu_offload_CPUOffload(offload_params=True) (__main__.TestTPFSDPIntegration) 2023-01-11T22:14:25.6747924Z Tests training for TP + FSDP integration by comparing an FSDP-only ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56428 2023-01-11T22:14:25.6748462Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56429 2023-01-11T22:14:25.6749070Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:25.6749523Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:25.6750154Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:25.6750637Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:25.6751216Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:25.6751640Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:25.6752211Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:25.6752673Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:25.6753145Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:14:25.6753628Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:14:25.6754292Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:14:25.6754980Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:14:25.6755503Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:25.6755952Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:25.6756300Z dist init r=0, world=2 2023-01-11T22:14:25.6756553Z dist init r=1, world=2 2023-01-11T22:14:25.6756824Z skip: Need at least 4 CUDA devices (3.912s) 2023-01-11T22:14:25.6757020Z 2023-01-11T22:14:25.6757293Z ---------------------------------------------------------------------- 2023-01-11T22:14:25.6757624Z Ran 5 tests in 21.366s 2023-01-11T22:14:25.6757786Z 2023-01-11T22:14:25.6757900Z OK (skipped=5) 2023-01-11T22:14:25.6758034Z 2023-01-11T22:14:25.6758162Z Generating XML reports... 2023-01-11T22:14:25.6758797Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_tp_integration/TEST-TestTPFSDPIntegration-20230111221403.xml 2023-01-11T22:14:25.6759174Z 2023-01-11T22:14:25.6759505Z ##[endgroup] 2023-01-11T22:14:25.6760119Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_tp_integration (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_tp_integration_esizilor) 2023-01-11T22:14:25.6760492Z 2023-01-11T22:14:25.6760773Z Running distributed/checkpoint/test_checkpoint ... [2023-01-11 22:14:25.669851] 2023-01-11T22:14:25.6761474Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/checkpoint/test_checkpoint.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:14:25.670099] 2023-01-11T22:14:51.9280565Z 2023-01-11T22:14:51.9281061Z Expand the folded group to see the log file of distributed/checkpoint/test_checkpoint 2023-01-11T22:14:51.9283359Z ##[group]PRINTING LOG FILE of distributed/checkpoint/test_checkpoint (/var/lib/jenkins/workspace/test/test-reports/distributed-checkpoint-test_checkpoint_vtksq92v) 2023-01-11T22:14:51.9283768Z 2023-01-11T22:14:51.9283899Z Running tests... 2023-01-11T22:14:51.9285654Z ---------------------------------------------------------------------- 2023-01-11T22:14:51.9286293Z Test results will be stored in test-reports/python-unittest/distributed.checkpoint.test_checkpoint 2023-01-11T22:14:51.9286860Z test_default_metadata (__main__.TestDistributedCheckpointing) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:14:51.9287360Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56542 2023-01-11T22:14:51.9287794Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56543 2023-01-11T22:14:51.9288413Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:51.9290614Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:51.9291305Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:51.9292007Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:51.9292627Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:51.9293063Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:51.9293640Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:51.9294113Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:51.9294549Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:51.9295002Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:51.9295497Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:14:51.9295999Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:14:51.9297190Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:14:51.9298255Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:14:51.9298650Z ok (5.549s) 2023-01-11T22:14:51.9299130Z test_tensor_metadata_with_missing_rank_spec (__main__.TestDistributedCheckpointing) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56621 2023-01-11T22:14:51.9299682Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56622 2023-01-11T22:14:51.9300316Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:51.9300774Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:51.9301350Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:51.9301819Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:51.9302380Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:51.9302826Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:51.9316233Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:51.9316751Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:51.9317186Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:51.9317805Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:14:51.9318299Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:51.9318787Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:14:51.9319435Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:14:51.9320114Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:14:51.9320507Z ok (3.912s) 2023-01-11T22:14:51.9320946Z test_dummy_reader_works (__main__.TestDistributedFailure) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56700 2023-01-11T22:14:51.9321461Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56701 2023-01-11T22:14:51.9321911Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 56702 2023-01-11T22:14:51.9322362Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 56703 2023-01-11T22:14:51.9323049Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:51.9323495Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:51.9324075Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:51.9324544Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:51.9325103Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:51.9325551Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:51.9326119Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:51.9326592Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:51.9327149Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:51.9327598Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:51.9328173Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:51.9328639Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:51.9329195Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:51.9329643Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:51.9330211Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:51.9330745Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:51.9331180Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:14:51.9331638Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:14:51.9332097Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:51.9332563Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:51.9332954Z skip: Need at least 4 CUDA devices (2.410s) 2023-01-11T22:14:51.9333415Z test_dummy_writer_works (__main__.TestDistributedFailure) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56836 2023-01-11T22:14:51.9333941Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56837 2023-01-11T22:14:51.9334386Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 56838 2023-01-11T22:14:51.9334894Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 56839 2023-01-11T22:14:51.9335501Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:51.9335951Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:51.9336522Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:51.9337744Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:51.9338335Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:51.9338779Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:51.9339330Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:51.9339795Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:51.9340465Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:51.9340917Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:51.9341468Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:51.9341927Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:51.9342495Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:51.9342934Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:51.9343482Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:51.9343944Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:51.9344384Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:14:51.9344841Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:51.9345310Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:14:51.9345766Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:51.9346158Z skip: Need at least 4 CUDA devices (2.410s) 2023-01-11T22:14:51.9346623Z test_load_error_handling (__main__.TestDistributedFailure) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56972 2023-01-11T22:14:51.9347152Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56973 2023-01-11T22:14:51.9347598Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 56974 2023-01-11T22:14:51.9348024Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 56975 2023-01-11T22:14:51.9348633Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:51.9349084Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:51.9349656Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:51.9350104Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:51.9350673Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:51.9351115Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:51.9351683Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:51.9352124Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:51.9352787Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:51.9353230Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:51.9353779Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:51.9354240Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:51.9354811Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:51.9355249Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:51.9355792Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:51.9356250Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:51.9356686Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:51.9357139Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:14:51.9357654Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:14:51.9358120Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:51.9358511Z skip: Need at least 4 CUDA devices (2.410s) 2023-01-11T22:14:51.9358981Z test_load_error_handling_no_dist (__main__.TestDistributedFailure) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57108 2023-01-11T22:14:51.9359514Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57109 2023-01-11T22:14:51.9359960Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 57110 2023-01-11T22:14:51.9360379Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 57111 2023-01-11T22:14:51.9360995Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:51.9361447Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:51.9362016Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:51.9362466Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:51.9363037Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:51.9363477Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:51.9364043Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:51.9364486Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:51.9365067Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:51.9365505Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:51.9366057Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:51.9366519Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:51.9367093Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:51.9367530Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:51.9368077Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:51.9368537Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:51.9368975Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:14:51.9369490Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:51.9369959Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:14:51.9370419Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:51.9370763Z ok (2.409s) 2023-01-11T22:14:51.9371183Z test_save_error_handling (__main__.TestDistributedFailure) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57244 2023-01-11T22:14:51.9371712Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57245 2023-01-11T22:14:51.9372161Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 57246 2023-01-11T22:14:51.9372603Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 57247 2023-01-11T22:14:51.9373192Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:51.9373640Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:51.9374261Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:51.9374721Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:51.9375293Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:51.9375734Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:51.9376301Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:51.9377329Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:51.9377967Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:51.9378417Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:51.9378968Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:51.9379429Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:51.9379995Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:51.9380436Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:51.9380984Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:51.9381443Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:51.9381876Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:51.9382347Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:14:51.9382797Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:51.9383258Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:14:51.9383650Z skip: Need at least 4 CUDA devices (2.410s) 2023-01-11T22:14:51.9384122Z test_save_error_handling_no_dist (__main__.TestDistributedFailure) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57380 2023-01-11T22:14:51.9384656Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57381 2023-01-11T22:14:51.9385101Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 57382 2023-01-11T22:14:51.9385541Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 57383 2023-01-11T22:14:51.9386129Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:51.9386682Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:51.9387259Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:51.9387707Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:51.9388283Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:51.9388725Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:51.9389293Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:51.9389738Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:51.9390305Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:51.9390748Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:51.9391292Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:51.9391820Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:51.9392403Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:14:51.9392843Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:51.9393385Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:51.9393845Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:51.9394277Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:51.9394745Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:14:51.9395196Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:14:51.9395656Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:51.9395997Z ok (2.410s) 2023-01-11T22:14:51.9396150Z 2023-01-11T22:14:51.9396406Z ---------------------------------------------------------------------- 2023-01-11T22:14:51.9396733Z Ran 8 tests in 23.920s 2023-01-11T22:14:51.9396897Z 2023-01-11T22:14:51.9397007Z OK (skipped=4) 2023-01-11T22:14:51.9397161Z 2023-01-11T22:14:51.9397287Z Generating XML reports... 2023-01-11T22:14:51.9397929Z Generated XML report: test-reports/python-unittest/distributed.checkpoint.test_checkpoint/TEST-TestDistributedCheckpointing-20230111221427.xml 2023-01-11T22:14:51.9398792Z Generated XML report: test-reports/python-unittest/distributed.checkpoint.test_checkpoint/TEST-TestDistributedFailure-20230111221427.xml 2023-01-11T22:14:51.9399166Z 2023-01-11T22:14:51.9399629Z ##[endgroup] 2023-01-11T22:14:51.9400249Z FINISHED PRINTING LOG FILE of distributed/checkpoint/test_checkpoint (/var/lib/jenkins/workspace/test/test-reports/distributed-checkpoint-test_checkpoint_vtksq92v) 2023-01-11T22:14:51.9400628Z 2023-01-11T22:14:51.9400914Z Running distributed/tensor/parallel/test_tp_style ... [2023-01-11 22:14:51.928136] 2023-01-11T22:14:51.9401611Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/tensor/parallel/test_tp_style.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:14:51.928384] 2023-01-11T22:15:14.7384327Z 2023-01-11T22:15:14.7384852Z Expand the folded group to see the log file of distributed/tensor/parallel/test_tp_style 2023-01-11T22:15:14.7388917Z ##[group]PRINTING LOG FILE of distributed/tensor/parallel/test_tp_style (/var/lib/jenkins/workspace/test/test-reports/distributed-tensor-parallel-test_tp_style_igz325k_) 2023-01-11T22:15:14.7389336Z 2023-01-11T22:15:14.7389435Z Running tests... 2023-01-11T22:15:14.7390201Z ---------------------------------------------------------------------- 2023-01-11T22:15:14.7390784Z Test results will be stored in test-reports/python-unittest/distributed.tensor.parallel.test_tp_style 2023-01-11T22:15:14.7391347Z test_colwise_parallel_style (__main__.TensorParallelStyleTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:15:14.7391830Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57551 2023-01-11T22:15:14.7392279Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57552 2023-01-11T22:15:14.7392722Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 57553 2023-01-11T22:15:14.7393642Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 57554 2023-01-11T22:15:14.7394377Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7394839Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7395432Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7396056Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7396681Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7397136Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7397722Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7398174Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7398753Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7399190Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7399767Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7400218Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7400796Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7402375Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7403066Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7403567Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7404013Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:15:14.7404494Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:14.7404941Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:15:14.7405415Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:14.7405799Z skip: Need at least 4 CUDA devices (3.994s) 2023-01-11T22:15:14.7406280Z test_make_input_replicate_1d (__main__.TensorParallelStyleTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57687 2023-01-11T22:15:14.7406824Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57688 2023-01-11T22:15:14.7407265Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 57689 2023-01-11T22:15:14.7407703Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 57690 2023-01-11T22:15:14.7408297Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7408744Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7409311Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7409902Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7410467Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7410909Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7411477Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7412018Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7412590Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7413017Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7413582Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7414048Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7414682Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7415115Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7415679Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7416135Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7416912Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:14.7417432Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:15:14.7417892Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:14.7418353Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:15:14.7418728Z skip: Need at least 4 CUDA devices (2.309s) 2023-01-11T22:15:14.7419217Z test_make_input_shard_1d (__main__.TensorParallelStyleTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57823 2023-01-11T22:15:14.7419747Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57824 2023-01-11T22:15:14.7420170Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 57825 2023-01-11T22:15:14.7420610Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 57826 2023-01-11T22:15:14.7421219Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7421661Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7422211Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7422678Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7423250Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7423688Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7424235Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7424691Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7425257Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7425672Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7426238Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7426830Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7427404Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7427829Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7428392Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7428846Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7429255Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:14.7429722Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:14.7430175Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:15:14.7430636Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:15:14.7431009Z skip: Need at least 4 CUDA devices (2.409s) 2023-01-11T22:15:14.7431570Z test_make_output_replicate_1d (__main__.TensorParallelStyleTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57959 2023-01-11T22:15:14.7432123Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57960 2023-01-11T22:15:14.7432565Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 57961 2023-01-11T22:15:14.7432985Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 57962 2023-01-11T22:15:14.7433581Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7434024Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7434575Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7435042Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7435609Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7436051Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7436600Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7437053Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7437617Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7438033Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7438595Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7439048Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7439615Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7440032Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7440594Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7441046Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7441476Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:15:14.7441926Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:14.7442380Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:15:14.7442837Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:14.7443210Z skip: Need at least 4 CUDA devices (2.409s) 2023-01-11T22:15:14.7443761Z test_make_output_shard_1d (__main__.TensorParallelStyleTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58095 2023-01-11T22:15:14.7444296Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58096 2023-01-11T22:15:14.7444738Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 58097 2023-01-11T22:15:14.7445160Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 58098 2023-01-11T22:15:14.7445758Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7446204Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7446754Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7447217Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7447788Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7448281Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7448843Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7449303Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7449872Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7450289Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7450847Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7451301Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7451864Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7452291Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7452854Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7453305Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7453733Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:14.7454183Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:15:14.7454635Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:15:14.7455097Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:14.7455466Z skip: Need at least 4 CUDA devices (2.410s) 2023-01-11T22:15:14.7455949Z test_make_output_tensor (__main__.TensorParallelStyleTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58231 2023-01-11T22:15:14.7456481Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58232 2023-01-11T22:15:14.7457247Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 58233 2023-01-11T22:15:14.7457666Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 58234 2023-01-11T22:15:14.7458270Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7458716Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7459265Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7459727Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7460296Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7460832Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7461384Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7461843Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7462408Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7462839Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7463386Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7463842Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7464404Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7464826Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7465448Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7465915Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7466343Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:15:14.7466791Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:14.7467247Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:14.7467703Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:15:14.7468073Z skip: Need at least 4 CUDA devices (2.310s) 2023-01-11T22:15:14.7468561Z test_prepare_output_error (__main__.TensorParallelStyleTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58367 2023-01-11T22:15:14.7469101Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58368 2023-01-11T22:15:14.7469542Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 58369 2023-01-11T22:15:14.7469962Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 58370 2023-01-11T22:15:14.7470562Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7471003Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7471552Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7472017Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7472586Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7473026Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7473577Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7474036Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7474600Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7475032Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7475573Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7476026Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7476586Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7477071Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7477634Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7478094Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7478522Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:15:14.7478966Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:14.7479418Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:14.7479876Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:15:14.7480243Z skip: Need at least 4 CUDA devices (2.409s) 2023-01-11T22:15:14.7480731Z test_rowwise_parallel_style (__main__.TensorParallelStyleTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58503 2023-01-11T22:15:14.7481273Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58504 2023-01-11T22:15:14.7481762Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 58505 2023-01-11T22:15:14.7482192Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 58506 2023-01-11T22:15:14.7482787Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7483229Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7483794Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7484241Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7484808Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7485247Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7485781Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7486227Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7486790Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7487248Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7487807Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7488263Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7488827Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:14.7489245Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:14.7489810Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:14.7490268Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:14.7490697Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:14.7491146Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:15:14.7491599Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:14.7492064Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:15:14.7492447Z skip: Need at least 4 CUDA devices (2.309s) 2023-01-11T22:15:14.7492624Z 2023-01-11T22:15:14.7492899Z ---------------------------------------------------------------------- 2023-01-11T22:15:14.7493227Z Ran 8 tests in 20.561s 2023-01-11T22:15:14.7493473Z 2023-01-11T22:15:14.7493582Z OK (skipped=8) 2023-01-11T22:15:14.7493733Z 2023-01-11T22:15:14.7493840Z Generating XML reports... 2023-01-11T22:15:14.7494491Z Generated XML report: test-reports/python-unittest/distributed.tensor.parallel.test_tp_style/TEST-TensorParallelStyleTest-20230111221453.xml 2023-01-11T22:15:14.7494885Z 2023-01-11T22:15:14.7495326Z ##[endgroup] 2023-01-11T22:15:14.7495956Z FINISHED PRINTING LOG FILE of distributed/tensor/parallel/test_tp_style (/var/lib/jenkins/workspace/test/test-reports/distributed-tensor-parallel-test_tp_style_igz325k_) 2023-01-11T22:15:14.7496344Z 2023-01-11T22:15:14.7497024Z Running distributed/_shard/sharded_tensor/ops/test_matrix_ops ... [2023-01-11 22:15:14.738602] 2023-01-11T22:15:14.7497778Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_shard/sharded_tensor/ops/test_matrix_ops.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:15:14.738844] 2023-01-11T22:15:44.9928602Z 2023-01-11T22:15:44.9929382Z Expand the folded group to see the log file of distributed/_shard/sharded_tensor/ops/test_matrix_ops 2023-01-11T22:15:44.9930760Z ##[group]PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_matrix_ops (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_matrix_ops_aa1br6jq) 2023-01-11T22:15:44.9931203Z 2023-01-11T22:15:44.9933840Z Running tests... 2023-01-11T22:15:44.9934780Z ---------------------------------------------------------------------- 2023-01-11T22:15:44.9935432Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_matrix_ops 2023-01-11T22:15:44.9936008Z test_sharded_tensor_contiguous (__main__.TestShardedTensorMatrixOps) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:15:44.9936516Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58674 2023-01-11T22:15:44.9937235Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58675 2023-01-11T22:15:44.9937689Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 58676 2023-01-11T22:15:44.9938143Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 58677 2023-01-11T22:15:44.9938781Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:44.9939216Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:44.9939793Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:44.9940262Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:44.9940840Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:44.9941268Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:44.9941855Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:44.9942326Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:44.9942901Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:44.9943328Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:44.9943899Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:44.9944358Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:44.9944917Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:44.9945359Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:44.9945921Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:44.9946574Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:44.9946993Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:44.9947474Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:15:44.9947961Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:15:44.9948420Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:44.9948791Z skip: Need at least 4 CUDA devices (4.066s) 2023-01-11T22:15:44.9949297Z test_sharded_tensor_layer_norm (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58810 2023-01-11T22:15:44.9949852Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58811 2023-01-11T22:15:44.9950277Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 58812 2023-01-11T22:15:44.9950732Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 58813 2023-01-11T22:15:44.9951439Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:44.9951901Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:44.9952466Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:44.9952932Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:44.9953503Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:44.9953945Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:44.9954492Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:44.9954956Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:44.9955523Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:44.9955948Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:44.9956514Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:44.9956972Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:44.9957540Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:44.9957959Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:44.9958524Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:44.9958981Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:44.9959510Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:15:44.9959981Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:44.9960425Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:44.9960890Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:15:44.9961275Z skip: Need at least 4 CUDA devices (2.410s) 2023-01-11T22:15:44.9961781Z test_sharded_tensor_layer_norm_error (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58946 2023-01-11T22:15:44.9962318Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58947 2023-01-11T22:15:44.9962759Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 58948 2023-01-11T22:15:44.9963271Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 58949 2023-01-11T22:15:44.9963882Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:44.9964314Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:44.9964882Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:44.9965344Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:44.9965897Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:44.9966336Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:44.9966896Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:44.9967352Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:44.9967909Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:44.9968408Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:44.9968979Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:44.9969421Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:44.9969987Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:44.9970423Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:44.9970987Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:44.9971423Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:44.9971860Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:15:44.9972331Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:44.9972797Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:44.9973239Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:15:44.9973621Z skip: Need at least 4 CUDA devices (2.410s) 2023-01-11T22:15:44.9974119Z test_sharded_tensor_masked_fill (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59082 2023-01-11T22:15:44.9974648Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59083 2023-01-11T22:15:44.9975088Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 59084 2023-01-11T22:15:44.9975526Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 59085 2023-01-11T22:15:44.9976137Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:44.9976835Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:44.9977430Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:44.9977895Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:44.9978449Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:44.9978886Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:44.9979451Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:44.9979909Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:44.9980564Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:44.9981003Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:44.9981570Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:44.9982027Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:44.9982577Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:44.9983014Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:44.9983574Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:44.9984012Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:44.9984445Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:15:44.9984917Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:44.9985447Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:15:44.9985905Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:44.9986285Z skip: Need at least 4 CUDA devices (2.409s) 2023-01-11T22:15:44.9986792Z test_sharded_tensor_masked_fill_error (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59218 2023-01-11T22:15:44.9987327Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59219 2023-01-11T22:15:44.9987772Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 59220 2023-01-11T22:15:44.9988203Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 59221 2023-01-11T22:15:44.9988815Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:44.9989244Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:44.9989815Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:44.9990275Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:44.9990846Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:44.9991269Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:44.9991831Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:44.9992290Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:44.9992838Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:44.9993281Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:44.9993842Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:44.9994294Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:44.9994843Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:44.9995277Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:44.9995838Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:44.9996276Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:44.9996707Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:15:44.9997243Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:44.9997710Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:44.9998154Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:15:44.9998535Z skip: Need at least 4 CUDA devices (2.310s) 2023-01-11T22:15:44.9999032Z test_sharded_tensor_softmax (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59354 2023-01-11T22:15:44.9999572Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59355 2023-01-11T22:15:44.9999995Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 59356 2023-01-11T22:15:45.0000425Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 59357 2023-01-11T22:15:45.0001025Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:45.0001453Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:45.0002075Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:45.0002546Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:45.0003115Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:45.0003534Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:45.0004097Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:45.0004552Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:45.0005104Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:45.0005540Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:45.0006105Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:45.0006559Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:45.0007110Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:45.0007551Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:45.0008109Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:45.0008544Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:45.0008973Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:15:45.0009442Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:45.0009902Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:45.0010350Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:15:45.0010733Z skip: Need at least 4 CUDA devices (2.410s) 2023-01-11T22:15:45.0011235Z test_sharded_tensor_transpose (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59490 2023-01-11T22:15:45.0011780Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59491 2023-01-11T22:15:45.0012202Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 59492 2023-01-11T22:15:45.0012633Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 59493 2023-01-11T22:15:45.0013236Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:45.0013724Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:45.0014295Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:45.0014754Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:45.0015322Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:45.0015741Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:45.0016297Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:45.0017108Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:45.0017672Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:45.0018109Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:45.0018711Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:45.0019249Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:45.0019816Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:45.0020260Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:45.0020820Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:45.0021277Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:45.0021687Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:15:45.0022154Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:45.0022657Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:15:45.0023104Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:45.0023489Z skip: Need at least 4 CUDA devices (2.410s) 2023-01-11T22:15:45.0023998Z test_sharded_tensor_transpose_error (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59626 2023-01-11T22:15:45.0024578Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59627 2023-01-11T22:15:45.0025002Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 59628 2023-01-11T22:15:45.0025433Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 59629 2023-01-11T22:15:45.0026037Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:45.0026467Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:45.0027034Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:45.0027498Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:45.0028068Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:45.0028488Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:45.0029054Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:45.0029512Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:45.0030078Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:45.0030495Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:45.0031156Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:45.0031614Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:45.0032165Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:45.0032598Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:45.0033159Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:45.0033615Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:45.0034027Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:15:45.0034493Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:45.0034957Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:45.0035399Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:15:45.0035842Z skip: Need at least 4 CUDA devices (2.409s) 2023-01-11T22:15:45.0036350Z test_sharded_tensor_type_as (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59762 2023-01-11T22:15:45.0036889Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59763 2023-01-11T22:15:45.0037314Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 59764 2023-01-11T22:15:45.0037746Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 59765 2023-01-11T22:15:45.0038350Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:45.0038792Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:45.0039348Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:45.0039812Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:45.0040377Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:45.0040798Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:45.0041356Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:45.0041810Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:45.0042375Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:45.0042790Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:45.0043358Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:45.0043808Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:45.0044359Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:45.0044791Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:45.0045346Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:45.0045796Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:45.0046208Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:45.0046673Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:45.0047134Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:15:45.0047658Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:15:45.0048043Z skip: Need at least 4 CUDA devices (2.410s) 2023-01-11T22:15:45.0048534Z test_sharded_tensor_view (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59898 2023-01-11T22:15:45.0049071Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59899 2023-01-11T22:15:45.0049494Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 59900 2023-01-11T22:15:45.0049925Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 59901 2023-01-11T22:15:45.0050528Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:45.0050972Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:45.0051518Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:45.0051978Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:45.0052596Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:45.0053023Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:45.0053591Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:45.0054043Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:45.0054610Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:45.0055027Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:45.0055586Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:45.0056049Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:45.0056880Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:45.0057328Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:45.0057895Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:45.0058348Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:45.0058760Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:15:45.0059224Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:45.0059684Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:45.0060149Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:15:45.0060517Z skip: Need at least 4 CUDA devices (2.410s) 2023-01-11T22:15:45.0061014Z test_sharded_tensor_view_error (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60034 2023-01-11T22:15:45.0061556Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60035 2023-01-11T22:15:45.0061979Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 60036 2023-01-11T22:15:45.0062408Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 60037 2023-01-11T22:15:45.0063009Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:45.0063452Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:45.0064000Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:45.0064553Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:45.0065130Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:45.0065548Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:45.0066110Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:45.0066562Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:45.0067127Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:45.0067541Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:45.0068103Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:45.0068563Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:45.0069194Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:15:45.0069622Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:45.0070184Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:45.0070637Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:45.0071047Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:15:45.0071512Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:15:45.0071971Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:45.0072428Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:45.0072800Z skip: Need at least 4 CUDA devices (2.309s) 2023-01-11T22:15:45.0072993Z 2023-01-11T22:15:45.0073265Z ---------------------------------------------------------------------- 2023-01-11T22:15:45.0073592Z Ran 11 tests in 27.965s 2023-01-11T22:15:45.0073754Z 2023-01-11T22:15:45.0073846Z OK (skipped=11) 2023-01-11T22:15:45.0074002Z 2023-01-11T22:15:45.0074126Z Generating XML reports... 2023-01-11T22:15:45.0074788Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_matrix_ops/TEST-TestShardedTensorMatrixOps-20230111221516.xml 2023-01-11T22:15:45.0075185Z 2023-01-11T22:15:45.0075647Z ##[endgroup] 2023-01-11T22:15:45.0076306Z FINISHED PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_matrix_ops (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_matrix_ops_aa1br6jq) 2023-01-11T22:15:45.0076703Z 2023-01-11T22:15:45.0076974Z Running distributed/_tensor/test_matrix_ops ... [2023-01-11 22:15:44.992877] 2023-01-11T22:15:45.0077648Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_tensor/test_matrix_ops.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:15:44.993119] 2023-01-11T22:16:21.0520552Z 2023-01-11T22:16:21.0521663Z Expand the folded group to see the log file of distributed/_tensor/test_matrix_ops 2023-01-11T22:16:21.0523185Z ##[group]PRINTING LOG FILE of distributed/_tensor/test_matrix_ops (/var/lib/jenkins/workspace/test/test-reports/distributed-_tensor-test_matrix_ops_cq3rtu6y) 2023-01-11T22:16:21.0523841Z 2023-01-11T22:16:21.0524041Z Running tests... 2023-01-11T22:16:21.0524911Z ---------------------------------------------------------------------- 2023-01-11T22:16:21.0525940Z Test results will be stored in test-reports/python-unittest/distributed._tensor.test_matrix_ops 2023-01-11T22:16:21.0526852Z test_addmm (__main__.DistMatrixOpsTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:16:21.0528105Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60205 2023-01-11T22:16:21.0528614Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60206 2023-01-11T22:16:21.0529285Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:16:21.0529742Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:21.0530300Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:21.0530880Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:21.0533904Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:16:21.0534378Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:21.0535011Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:21.0535498Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:21.0536123Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:16:21.0537136Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:16:21.0537654Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:16:21.0538140Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:16:21.0538814Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:16:21.0539482Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:16:21.0539879Z ok (5.946s) 2023-01-11T22:16:21.0540320Z test_addmm_auto_redistribute (__main__.DistMatrixOpsTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60286 2023-01-11T22:16:21.0540830Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60287 2023-01-11T22:16:21.0541433Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:16:21.0541882Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:21.0542455Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:21.0542903Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:21.0543479Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:16:21.0543926Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:21.0544475Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:21.0544945Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:21.0545383Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:16:21.0545872Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:16:21.0546333Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:16:21.0546809Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:16:21.0547465Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:16:21.0548148Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:16:21.0548651Z ok (4.312s) 2023-01-11T22:16:21.0549062Z test_baddbmm (__main__.DistMatrixOpsTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60371 2023-01-11T22:16:21.0549573Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60372 2023-01-11T22:16:21.0550166Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:16:21.0550616Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:21.0551184Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:21.0551649Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:21.0552198Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:16:21.0552639Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:21.0553209Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:21.0553729Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:21.0554178Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:16:21.0554644Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:16:21.0555126Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:16:21.0555603Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:16:21.0556260Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:16:21.0556940Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:16:21.0557332Z ok (6.416s) 2023-01-11T22:16:21.0557720Z test_bmm (__main__.DistMatrixOpsTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60456 2023-01-11T22:16:21.0558222Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60457 2023-01-11T22:16:21.0558825Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:16:21.0559252Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:21.0559820Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:21.0560333Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:21.0560908Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:16:21.0561351Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:21.0561908Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:21.0562372Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:21.0562805Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:16:21.0563290Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:16:21.0563752Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:16:21.0564235Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:16:21.0564883Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:16:21.0565549Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:16:21.0566011Z ok (4.512s) 2023-01-11T22:16:21.0566417Z test_mm (__main__.DistMatrixOpsTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60541 2023-01-11T22:16:21.0566910Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60542 2023-01-11T22:16:21.0567500Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:16:21.0567945Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:21.0568512Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:21.0568959Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:21.0569534Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:16:21.0569976Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:21.0570545Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:21.0571042Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:21.0571489Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:16:21.0571974Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:16:21.0572451Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:16:21.0572909Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:16:21.0573562Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:16:21.0574245Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:16:21.0574623Z ok (4.412s) 2023-01-11T22:16:21.0575023Z test_t (__main__.DistMatrixOpsTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60626 2023-01-11T22:16:21.0575520Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60627 2023-01-11T22:16:21.0576119Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:16:21.0576971Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:21.0577744Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:21.0578217Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:21.0578777Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:16:21.0579224Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:21.0579794Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:21.0580256Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:21.0580672Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:16:21.0581157Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:16:21.0581638Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:16:21.0582114Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:16:21.0582744Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:16:21.0583550Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:16:21.0583943Z ok (3.810s) 2023-01-11T22:16:21.0584345Z test_t_partial (__main__.DistMatrixOpsTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60707 2023-01-11T22:16:21.0584852Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60708 2023-01-11T22:16:21.0585449Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:16:21.0585889Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:21.0586442Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:21.0586906Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:21.0587477Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:16:21.0587921Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:21.0588541Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:21.0589018Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:21.0589449Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:16:21.0589915Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:16:21.0590395Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:16:21.0590882Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:16:21.0591536Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:16:21.0592209Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:16:21.0592595Z ok (4.311s) 2023-01-11T22:16:21.0592748Z 2023-01-11T22:16:21.0593020Z ---------------------------------------------------------------------- 2023-01-11T22:16:21.0593330Z Ran 7 tests in 33.720s 2023-01-11T22:16:21.0593492Z 2023-01-11T22:16:21.0593586Z OK 2023-01-11T22:16:21.0593718Z 2023-01-11T22:16:21.0593841Z Generating XML reports... 2023-01-11T22:16:21.0594429Z Generated XML report: test-reports/python-unittest/distributed._tensor.test_matrix_ops/TEST-DistMatrixOpsTest-20230111221546.xml 2023-01-11T22:16:21.0594759Z 2023-01-11T22:16:21.0595220Z ##[endgroup] 2023-01-11T22:16:21.0595821Z FINISHED PRINTING LOG FILE of distributed/_tensor/test_matrix_ops (/var/lib/jenkins/workspace/test/test-reports/distributed-_tensor-test_matrix_ops_cq3rtu6y) 2023-01-11T22:16:21.0596167Z 2023-01-11T22:16:21.0596456Z Running distributed/fsdp/test_fsdp_flatten_params ... [2023-01-11 22:16:21.052205] 2023-01-11T22:16:21.0597158Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_flatten_params.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:16:21.052564] 2023-01-11T22:17:00.2477382Z 2023-01-11T22:17:00.2477871Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_flatten_params 2023-01-11T22:17:00.2479002Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_flatten_params (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_flatten_params_81__dqxf) 2023-01-11T22:17:00.2479763Z 2023-01-11T22:17:00.2479991Z Running tests... 2023-01-11T22:17:00.2480510Z ---------------------------------------------------------------------- 2023-01-11T22:17:00.2482792Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_flatten_params 2023-01-11T22:17:00.2483262Z test_empty_module (__main__.TestFlattenParams) 2023-01-11T22:17:00.2484654Z Tests flattening an empty module (i.e. one without any parameters). ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:17:00.2485159Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60823 2023-01-11T22:17:00.2485829Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:17:00.2486270Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:00.2486852Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:00.2487325Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:00.2487784Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:17:00.2488444Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:17:00.2488986Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:17:00.2489883Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T22:17:00.2490383Z warnings.warn( 2023-01-11T22:17:00.2490623Z dist init r=0, world=1 2023-01-11T22:17:00.2490862Z ok (5.458s) 2023-01-11T22:17:00.2491179Z test_flat_param_shard_metadata (__main__.TestFlattenParams) 2023-01-11T22:17:00.2491684Z Tests that ``FlatParameter`` shard metadata are computed as expected. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60863 2023-01-11T22:17:00.2492389Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:17:00.2495111Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:00.2495786Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:00.2496292Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:00.2497116Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:17:00.2497846Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:17:00.2498842Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:17:00.2499339Z dist init r=0, world=1 2023-01-11T22:17:00.2499627Z ok (3.711s) 2023-01-11T22:17:00.2500112Z test_flatten_nothing (__main__.TestFlattenParams) 2023-01-11T22:17:00.2500622Z Tests that constructing a ``FlatParamHandle`` with no parameters ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60903 2023-01-11T22:17:00.2501335Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:17:00.2501791Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:00.2502349Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:00.2502826Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:00.2503275Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:17:00.2503915Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:17:00.2504442Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:17:00.2504803Z dist init r=0, world=1 2023-01-11T22:17:00.2505045Z ok (3.708s) 2023-01-11T22:17:00.2505336Z test_numel_with_shared_params (__main__.TestFlattenParams) 2023-01-11T22:17:00.2506038Z Tests that numel is preserved after flattening when there are shared ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60943 2023-01-11T22:17:00.2506744Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:17:00.2507232Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:00.2507787Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:00.2508256Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:00.2508711Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:17:00.2509364Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:17:00.2509863Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:17:00.2510219Z dist init r=0, world=1 2023-01-11T22:17:00.2510462Z ok (3.708s) 2023-01-11T22:17:00.2510758Z test_numel_without_shared_params (__main__.TestFlattenParams) 2023-01-11T22:17:00.2511376Z Tests that numel is preserved after flattening when there are no shared ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60983 2023-01-11T22:17:00.2512090Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:17:00.2512537Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:00.2513089Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:00.2513555Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:00.2514005Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:17:00.2514657Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:17:00.2515160Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:17:00.2515515Z dist init r=0, world=1 2023-01-11T22:17:00.2515758Z ok (3.708s) 2023-01-11T22:17:00.2516051Z test_output_with_shared_params (__main__.TestFlattenParams) 2023-01-11T22:17:00.2516571Z Tests a forward pass after flattening when there are shared parameters ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61023 2023-01-11T22:17:00.2517270Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:17:00.2517719Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:00.2518269Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:00.2518738Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:00.2519190Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:17:00.2519829Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:17:00.2520342Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:17:00.2521174Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T22:17:00.2521656Z warnings.warn( 2023-01-11T22:17:00.2521890Z dist init r=0, world=1 2023-01-11T22:17:00.2522129Z ok (4.309s) 2023-01-11T22:17:00.2522444Z test_output_without_shared_params (__main__.TestFlattenParams) 2023-01-11T22:17:00.2522939Z Tests a forward pass after flattening when there are no shared ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61063 2023-01-11T22:17:00.2523710Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:17:00.2524158Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:00.2524727Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:00.2525175Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:00.2525621Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:17:00.2526274Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:17:00.2526772Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:17:00.2527545Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T22:17:00.2528079Z warnings.warn( 2023-01-11T22:17:00.2528339Z dist init r=0, world=1 2023-01-11T22:17:00.2528565Z ok (4.211s) 2023-01-11T22:17:00.2528863Z test_partial_flattening (__main__.TestFlattenParams) 2023-01-11T22:17:00.2529340Z Tests flattening some submodules but not others. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61103 2023-01-11T22:17:00.2529986Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:17:00.2530431Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:00.2530997Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:00.2531466Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:00.2531896Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:17:00.2532547Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:17:00.2533063Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:17:00.2533828Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T22:17:00.2534288Z warnings.warn( 2023-01-11T22:17:00.2535439Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:17:00.2536290Z warnings.warn( 2023-01-11T22:17:00.2536864Z dist init r=0, world=1 2023-01-11T22:17:00.2537130Z ok (3.709s) 2023-01-11T22:17:00.2537443Z test_pnorm_after_step_with_shared_params (__main__.TestFlattenParams) 2023-01-11T22:17:00.2537975Z Tests for parameter Frobenius norm parity after an optimizer step when ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61143 2023-01-11T22:17:00.2538681Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:17:00.2539110Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:00.2539679Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:00.2540259Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:00.2540711Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:17:00.2541353Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:17:00.2541871Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:17:00.2542638Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T22:17:00.2543116Z warnings.warn( 2023-01-11T22:17:00.2543348Z dist init r=0, world=1 2023-01-11T22:17:00.2543588Z ok (4.309s) 2023-01-11T22:17:00.2543736Z 2023-01-11T22:17:00.2544010Z ---------------------------------------------------------------------- 2023-01-11T22:17:00.2544323Z Ran 9 tests in 36.833s 2023-01-11T22:17:00.2544483Z 2023-01-11T22:17:00.2544577Z OK 2023-01-11T22:17:00.2544710Z 2023-01-11T22:17:00.2544835Z Generating XML reports... 2023-01-11T22:17:00.2545498Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_flatten_params/TEST-TestFlattenParams-20230111221622.xml 2023-01-11T22:17:00.2545870Z 2023-01-11T22:17:00.2546257Z ##[endgroup] 2023-01-11T22:17:00.2546888Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_flatten_params (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_flatten_params_81__dqxf) 2023-01-11T22:17:00.2547259Z 2023-01-11T22:17:00.2547515Z Running distributed/test_c10d_common ... [2023-01-11 22:17:00.247749] 2023-01-11T22:17:00.2548182Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/test_c10d_common.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:17:00.248000] 2023-01-11T22:18:09.2508713Z 2023-01-11T22:18:09.2509679Z Expand the folded group to see the log file of distributed/test_c10d_common 2023-01-11T22:18:09.2510688Z ##[group]PRINTING LOG FILE of distributed/test_c10d_common (/var/lib/jenkins/workspace/test/test-reports/distributed-test_c10d_common_x5tiy0nq) 2023-01-11T22:18:09.2511234Z ]> 2023-01-11T22:18:09.2511605Z test_debug_level (__main__.CommTest) 2023-01-11T22:18:09.2512391Z , <__main__.ComputeBucketAssignmentTest testMethod=test_multi_limit_single_dtype>, <__main__.ComputeBucketAssignmentTest testMethod=test_single_limit_multi_dtype>, <__main__.ComputeBucketAssignmentTest testMethod=test_single_limit_single_dtype>]> 2023-01-11T22:18:09.2513214Z test_multi_limit_multi_dtype (__main__.ComputeBucketAssignmentTest) 2023-01-11T22:18:09.2513642Z test_multi_limit_single_dtype (__main__.ComputeBucketAssignmentTest) 2023-01-11T22:18:09.2514080Z test_single_limit_multi_dtype (__main__.ComputeBucketAssignmentTest) 2023-01-11T22:18:09.2514498Z test_single_limit_single_dtype (__main__.ComputeBucketAssignmentTest) 2023-01-11T22:18:09.2515361Z , <__main__.PythonProcessGroupExtensionTest testMethod=test_collectives>, <__main__.PythonProcessGroupExtensionTest testMethod=test_get_backend_name>, <__main__.PythonProcessGroupExtensionTest testMethod=test_send_recv>]> 2023-01-11T22:18:09.2516188Z test_backend_class_attr (__main__.PythonProcessGroupExtensionTest) 2023-01-11T22:18:09.2516624Z test_collectives (__main__.PythonProcessGroupExtensionTest) 2023-01-11T22:18:09.2517043Z test_get_backend_name (__main__.PythonProcessGroupExtensionTest) 2023-01-11T22:18:09.2517462Z test_send_recv (__main__.PythonProcessGroupExtensionTest) 2023-01-11T22:18:09.2518153Z , <__main__.ReduceOpTest testMethod=test_reduceop_copyable>, <__main__.ReduceOpTest testMethod=test_reduceop_equal>, <__main__.ReduceOpTest testMethod=test_reduceop_pickle>]> 2023-01-11T22:18:09.2519070Z test_op_isinstance_of_reduceop (__main__.ReduceOpTest) 2023-01-11T22:18:09.2519403Z test_reduceop_copyable (__main__.ReduceOpTest) 2023-01-11T22:18:09.2519733Z test_reduceop_equal (__main__.ReduceOpTest) 2023-01-11T22:18:09.2520063Z test_reduceop_pickle (__main__.ReduceOpTest) 2023-01-11T22:18:09.2520716Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2521173Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2521754Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2522226Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2522473Z 2023-01-11T22:18:09.2522572Z Running tests... 2023-01-11T22:18:09.2522983Z ---------------------------------------------------------------------- 2023-01-11T22:18:09.2523690Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_common 2023-01-11T22:18:09.2524167Z test_debug_level (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:18:09.2524590Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61253 2023-01-11T22:18:09.2525038Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61254 2023-01-11T22:18:09.2525652Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2526088Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2526662Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2530094Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2530708Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2531186Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2531762Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2532232Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2532652Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:18:09.2533132Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:09.2533476Z ok (3.953s) 2023-01-11T22:18:09.2533627Z 2023-01-11T22:18:09.2533882Z ---------------------------------------------------------------------- 2023-01-11T22:18:09.2534225Z Ran 1 test in 3.954s 2023-01-11T22:18:09.2534393Z 2023-01-11T22:18:09.2534491Z OK 2023-01-11T22:18:09.2534630Z 2023-01-11T22:18:09.2534831Z Generating XML reports... 2023-01-11T22:18:09.2535388Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_common/TEST-CommTest-20230111221703.xml 2023-01-11T22:18:09.2536041Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2536496Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2537834Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2538311Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2538542Z 2023-01-11T22:18:09.2538638Z Running tests... 2023-01-11T22:18:09.2539042Z ---------------------------------------------------------------------- 2023-01-11T22:18:09.2539571Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_common 2023-01-11T22:18:09.2540254Z test_multi_limit_multi_dtype (__main__.ComputeBucketAssignmentTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:18:09.2540615Z ok (1.641s) 2023-01-11T22:18:09.2540764Z 2023-01-11T22:18:09.2541037Z ---------------------------------------------------------------------- 2023-01-11T22:18:09.2541357Z Ran 1 test in 1.642s 2023-01-11T22:18:09.2541519Z 2023-01-11T22:18:09.2541595Z OK 2023-01-11T22:18:09.2541728Z 2023-01-11T22:18:09.2541853Z Generating XML reports... 2023-01-11T22:18:09.2542471Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_common/TEST-ComputeBucketAssignmentTest-20230111221710.xml 2023-01-11T22:18:09.2543188Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2543619Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2544186Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2544652Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2544877Z 2023-01-11T22:18:09.2544969Z Running tests... 2023-01-11T22:18:09.2545444Z ---------------------------------------------------------------------- 2023-01-11T22:18:09.2545990Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_common 2023-01-11T22:18:09.2546512Z test_multi_limit_single_dtype (__main__.ComputeBucketAssignmentTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:18:09.2546877Z ok (1.614s) 2023-01-11T22:18:09.2547026Z 2023-01-11T22:18:09.2547290Z ---------------------------------------------------------------------- 2023-01-11T22:18:09.2547611Z Ran 1 test in 1.615s 2023-01-11T22:18:09.2547771Z 2023-01-11T22:18:09.2547847Z OK 2023-01-11T22:18:09.2547980Z 2023-01-11T22:18:09.2548106Z Generating XML reports... 2023-01-11T22:18:09.2548722Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_common/TEST-ComputeBucketAssignmentTest-20230111221714.xml 2023-01-11T22:18:09.2549441Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2549877Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2550453Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2550920Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2551150Z 2023-01-11T22:18:09.2551244Z Running tests... 2023-01-11T22:18:09.2551645Z ---------------------------------------------------------------------- 2023-01-11T22:18:09.2552168Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_common 2023-01-11T22:18:09.2552683Z test_single_limit_multi_dtype (__main__.ComputeBucketAssignmentTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:18:09.2553055Z ok (1.598s) 2023-01-11T22:18:09.2553202Z 2023-01-11T22:18:09.2553463Z ---------------------------------------------------------------------- 2023-01-11T22:18:09.2553791Z Ran 1 test in 1.598s 2023-01-11T22:18:09.2553951Z 2023-01-11T22:18:09.2554030Z OK 2023-01-11T22:18:09.2554164Z 2023-01-11T22:18:09.2554287Z Generating XML reports... 2023-01-11T22:18:09.2554898Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_common/TEST-ComputeBucketAssignmentTest-20230111221717.xml 2023-01-11T22:18:09.2555615Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2556043Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2556613Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2557077Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2557376Z 2023-01-11T22:18:09.2557469Z Running tests... 2023-01-11T22:18:09.2557874Z ---------------------------------------------------------------------- 2023-01-11T22:18:09.2558410Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_common 2023-01-11T22:18:09.2558929Z test_single_limit_single_dtype (__main__.ComputeBucketAssignmentTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:18:09.2559296Z ok (1.607s) 2023-01-11T22:18:09.2559444Z 2023-01-11T22:18:09.2559706Z ---------------------------------------------------------------------- 2023-01-11T22:18:09.2560041Z Ran 1 test in 1.607s 2023-01-11T22:18:09.2560202Z 2023-01-11T22:18:09.2560277Z OK 2023-01-11T22:18:09.2560412Z 2023-01-11T22:18:09.2560534Z Generating XML reports... 2023-01-11T22:18:09.2561143Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_common/TEST-ComputeBucketAssignmentTest-20230111221721.xml 2023-01-11T22:18:09.2561861Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2562297Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2562922Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2563402Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2563629Z 2023-01-11T22:18:09.2563721Z Running tests... 2023-01-11T22:18:09.2564130Z ---------------------------------------------------------------------- 2023-01-11T22:18:09.2564657Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_common 2023-01-11T22:18:09.2565183Z test_backend_class_attr (__main__.PythonProcessGroupExtensionTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:18:09.2565673Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61492 2023-01-11T22:18:09.2566123Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61493 2023-01-11T22:18:09.2566570Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 61494 2023-01-11T22:18:09.2566992Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 61495 2023-01-11T22:18:09.2567593Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2568038Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2568609Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2569058Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2569630Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2570072Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2570618Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2571091Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2571668Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2572109Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2572655Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2573116Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2573685Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2574123Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2574667Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2575192Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2575629Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:18:09.2576082Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:09.2577173Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:18:09.2577706Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:18:09.2578049Z ok (3.957s) 2023-01-11T22:18:09.2578180Z 2023-01-11T22:18:09.2578464Z ---------------------------------------------------------------------- 2023-01-11T22:18:09.2578792Z Ran 1 test in 3.958s 2023-01-11T22:18:09.2578954Z 2023-01-11T22:18:09.2579048Z OK 2023-01-11T22:18:09.2579184Z 2023-01-11T22:18:09.2579290Z Generating XML reports... 2023-01-11T22:18:09.2579934Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_common/TEST-PythonProcessGroupExtensionTest-20230111221725.xml 2023-01-11T22:18:09.2580764Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2581230Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2581784Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2582251Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2582480Z 2023-01-11T22:18:09.2582591Z Running tests... 2023-01-11T22:18:09.2582970Z ---------------------------------------------------------------------- 2023-01-11T22:18:09.2583495Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_common 2023-01-11T22:18:09.2584012Z test_collectives (__main__.PythonProcessGroupExtensionTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:18:09.2584522Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61663 2023-01-11T22:18:09.2584954Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61664 2023-01-11T22:18:09.2585388Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 61665 2023-01-11T22:18:09.2585827Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 61666 2023-01-11T22:18:09.2586411Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2586860Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2587427Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2587892Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2588446Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2588889Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2589460Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2589920Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2590475Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2590914Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2591479Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2591920Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2592488Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2593016Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2593589Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2594030Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2594465Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:18:09.2595101Z [W socket.cpp:601] [c10d] The client socket has failed to connect to [localhost]:6789 (errno: 99 - Cannot assign requested address). 2023-01-11T22:18:09.2595581Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:09.2596049Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:18:09.2596526Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:18:09.2597009Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:18:09.2597519Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:18:09.2598019Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:18:09.2598502Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:18:09.2599160Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:18:09.2599924Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:18:09.2600593Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:18:09.2601255Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:18:09.2601649Z ok (6.839s) 2023-01-11T22:18:09.2601798Z 2023-01-11T22:18:09.2602067Z ---------------------------------------------------------------------- 2023-01-11T22:18:09.2602376Z Ran 1 test in 6.839s 2023-01-11T22:18:09.2602536Z 2023-01-11T22:18:09.2602632Z OK 2023-01-11T22:18:09.2602764Z 2023-01-11T22:18:09.2602888Z Generating XML reports... 2023-01-11T22:18:09.2603526Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_common/TEST-PythonProcessGroupExtensionTest-20230111221731.xml 2023-01-11T22:18:09.2604240Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2604688Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2605255Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2605725Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2605938Z 2023-01-11T22:18:09.2606048Z Running tests... 2023-01-11T22:18:09.2606451Z ---------------------------------------------------------------------- 2023-01-11T22:18:09.2606978Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_common 2023-01-11T22:18:09.2607482Z test_get_backend_name (__main__.PythonProcessGroupExtensionTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:18:09.2607982Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61843 2023-01-11T22:18:09.2608428Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61844 2023-01-11T22:18:09.2608869Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 61845 2023-01-11T22:18:09.2609287Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 61846 2023-01-11T22:18:09.2609888Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2610410Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2610969Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2611435Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2612007Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2612450Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2612997Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2613458Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2614031Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2614454Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2615067Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2615536Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2616108Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2616527Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2617750Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2618212Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2618632Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:18:09.2619110Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:18:09.2619569Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:09.2620037Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:18:09.2620363Z ok (3.958s) 2023-01-11T22:18:09.2620510Z 2023-01-11T22:18:09.2620779Z ---------------------------------------------------------------------- 2023-01-11T22:18:09.2621106Z Ran 1 test in 3.958s 2023-01-11T22:18:09.2621267Z 2023-01-11T22:18:09.2621342Z OK 2023-01-11T22:18:09.2621478Z 2023-01-11T22:18:09.2621604Z Generating XML reports... 2023-01-11T22:18:09.2622239Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_common/TEST-PythonProcessGroupExtensionTest-20230111221740.xml 2023-01-11T22:18:09.2622974Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2623455Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2624032Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2624497Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2624725Z 2023-01-11T22:18:09.2624837Z Running tests... 2023-01-11T22:18:09.2625215Z ---------------------------------------------------------------------- 2023-01-11T22:18:09.2625746Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_common 2023-01-11T22:18:09.2626256Z test_send_recv (__main__.PythonProcessGroupExtensionTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:18:09.2626734Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62014 2023-01-11T22:18:09.2627180Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62015 2023-01-11T22:18:09.2627623Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 62016 2023-01-11T22:18:09.2628195Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 62017 2023-01-11T22:18:09.2628790Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2629240Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2629812Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2630259Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2630833Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2631275Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2631838Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2632284Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2632920Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2633371Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2633921Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2634381Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2634946Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2635385Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2635925Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2636387Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2636820Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:09.2637292Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:18:09.2637761Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:18:09.2638241Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:18:09.2638720Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:18:09.2639177Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:18:09.2639654Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:18:09.2640142Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:18:09.2640802Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:18:09.2641466Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:18:09.2642145Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:18:09.2642819Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:18:09.2643202Z ok (5.747s) 2023-01-11T22:18:09.2643333Z 2023-01-11T22:18:09.2643603Z ---------------------------------------------------------------------- 2023-01-11T22:18:09.2643931Z Ran 1 test in 5.748s 2023-01-11T22:18:09.2644092Z 2023-01-11T22:18:09.2644185Z OK 2023-01-11T22:18:09.2644319Z 2023-01-11T22:18:09.2644426Z Generating XML reports... 2023-01-11T22:18:09.2645131Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_common/TEST-PythonProcessGroupExtensionTest-20230111221747.xml 2023-01-11T22:18:09.2645867Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2646313Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2646862Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2647324Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2647549Z 2023-01-11T22:18:09.2647659Z Running tests... 2023-01-11T22:18:09.2648038Z ---------------------------------------------------------------------- 2023-01-11T22:18:09.2648563Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_common 2023-01-11T22:18:09.2649049Z test_op_isinstance_of_reduceop (__main__.ReduceOpTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:18:09.2649408Z ok (1.625s) 2023-01-11T22:18:09.2649537Z 2023-01-11T22:18:09.2649803Z ---------------------------------------------------------------------- 2023-01-11T22:18:09.2650178Z Ran 1 test in 1.625s 2023-01-11T22:18:09.2650346Z 2023-01-11T22:18:09.2650439Z OK 2023-01-11T22:18:09.2650574Z 2023-01-11T22:18:09.2650679Z Generating XML reports... 2023-01-11T22:18:09.2651236Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_common/TEST-ReduceOpTest-20230111221755.xml 2023-01-11T22:18:09.2651907Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2652341Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2652911Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2653377Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2653611Z 2023-01-11T22:18:09.2653721Z Running tests... 2023-01-11T22:18:09.2654102Z ---------------------------------------------------------------------- 2023-01-11T22:18:09.2654632Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_common 2023-01-11T22:18:09.2655109Z test_reduceop_copyable (__main__.ReduceOpTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:18:09.2655437Z ok (1.623s) 2023-01-11T22:18:09.2655583Z 2023-01-11T22:18:09.2655843Z ---------------------------------------------------------------------- 2023-01-11T22:18:09.2656167Z Ran 1 test in 1.623s 2023-01-11T22:18:09.2656327Z 2023-01-11T22:18:09.2656401Z OK 2023-01-11T22:18:09.2657132Z 2023-01-11T22:18:09.2657389Z Generating XML reports... 2023-01-11T22:18:09.2657969Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_common/TEST-ReduceOpTest-20230111221759.xml 2023-01-11T22:18:09.2658641Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2659076Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2659651Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2660115Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2660345Z 2023-01-11T22:18:09.2660452Z Running tests... 2023-01-11T22:18:09.2660837Z ---------------------------------------------------------------------- 2023-01-11T22:18:09.2661361Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_common 2023-01-11T22:18:09.2661830Z test_reduceop_equal (__main__.ReduceOpTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:18:09.2662156Z ok (1.637s) 2023-01-11T22:18:09.2662305Z 2023-01-11T22:18:09.2662569Z ---------------------------------------------------------------------- 2023-01-11T22:18:09.2663006Z Ran 1 test in 1.637s 2023-01-11T22:18:09.2663167Z 2023-01-11T22:18:09.2663242Z OK 2023-01-11T22:18:09.2663373Z 2023-01-11T22:18:09.2663497Z Generating XML reports... 2023-01-11T22:18:09.2664054Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_common/TEST-ReduceOpTest-20230111221803.xml 2023-01-11T22:18:09.2664723Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:09.2665155Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:09.2665729Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:09.2666191Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:09.2666420Z 2023-01-11T22:18:09.2666531Z Running tests... 2023-01-11T22:18:09.2666914Z ---------------------------------------------------------------------- 2023-01-11T22:18:09.2667438Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_common 2023-01-11T22:18:09.2667915Z test_reduceop_pickle (__main__.ReduceOpTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:18:09.2668237Z ok (1.648s) 2023-01-11T22:18:09.2668454Z 2023-01-11T22:18:09.2668729Z ---------------------------------------------------------------------- 2023-01-11T22:18:09.2683943Z Ran 1 test in 1.648s 2023-01-11T22:18:09.2684177Z 2023-01-11T22:18:09.2684259Z OK 2023-01-11T22:18:09.2684397Z 2023-01-11T22:18:09.2684530Z Generating XML reports... 2023-01-11T22:18:09.2685147Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_common/TEST-ReduceOpTest-20230111221806.xml 2023-01-11T22:18:09.2685480Z 2023-01-11T22:18:09.2685872Z ##[endgroup] 2023-01-11T22:18:09.2686448Z FINISHED PRINTING LOG FILE of distributed/test_c10d_common (/var/lib/jenkins/workspace/test/test-reports/distributed-test_c10d_common_x5tiy0nq) 2023-01-11T22:18:09.2686778Z 2023-01-11T22:18:09.2687042Z Running distributed/fsdp/test_fsdp_comm ... [2023-01-11 22:18:09.251246] 2023-01-11T22:18:09.2687721Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_comm.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:18:09.251491] 2023-01-11T22:18:50.4853524Z 2023-01-11T22:18:50.4854183Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_comm 2023-01-11T22:18:50.4855362Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_comm (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_comm_q26zwve0) 2023-01-11T22:18:50.4855964Z 2023-01-11T22:18:50.4859149Z Running tests... 2023-01-11T22:18:50.4859993Z ---------------------------------------------------------------------- 2023-01-11T22:18:50.4860592Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_comm 2023-01-11T22:18:50.4861116Z test_communication_nested_model_False_use_no_sync_False_sharding_strategy_None (__main__.TestCommunication) 2023-01-11T22:18:50.4861741Z Tests FSDP's communication cost in terms of calls to collective ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:18:50.4862213Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62330 2023-01-11T22:18:50.4863593Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62331 2023-01-11T22:18:50.4864296Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:50.4864757Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:50.4865313Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:50.4865840Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:50.4866717Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:50.4867152Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:50.4868012Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:50.4868488Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:50.4868945Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:18:50.4869424Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:18:50.4870086Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:18:50.4870768Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:18:50.4871288Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:50.4871736Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:18:50.4872088Z dist init r=0, world=2 2023-01-11T22:18:50.4872343Z dist init r=1, world=2 2023-01-11T22:18:50.4872565Z ok (6.414s) 2023-01-11T22:18:50.4873103Z test_communication_nested_model_False_use_no_sync_False_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP (__main__.TestCommunication) 2023-01-11T22:18:50.4873869Z Tests FSDP's communication cost in terms of calls to collective ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62413 2023-01-11T22:18:50.4874409Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62414 2023-01-11T22:18:50.4874993Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:50.4875451Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:50.4876020Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:50.4876489Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:50.4877046Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:50.4877487Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:50.4878049Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:50.4878490Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:50.4878940Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:18:50.4879429Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:18:50.4880077Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:18:50.4880742Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:18:50.4881256Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:50.4881719Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:18:50.4882142Z dist init r=0, world=2 2023-01-11T22:18:50.4882394Z dist init r=1, world=2 2023-01-11T22:18:50.4882614Z ok (4.813s) 2023-01-11T22:18:50.4882990Z test_communication_nested_model_False_use_no_sync_True_sharding_strategy_None (__main__.TestCommunication) 2023-01-11T22:18:50.4883683Z Tests FSDP's communication cost in terms of calls to collective ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62496 2023-01-11T22:18:50.4884210Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62497 2023-01-11T22:18:50.4884790Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:50.4885306Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:50.4885883Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:50.4886349Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:50.4886908Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:50.4887341Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:50.4887901Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:50.4888342Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:50.4888785Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:18:50.4889277Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:18:50.4889983Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:18:50.4890657Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:18:50.4891176Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:18:50.4891641Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:50.4891979Z dist init r=0, world=2 2023-01-11T22:18:50.4892230Z dist init r=1, world=2 2023-01-11T22:18:50.4892466Z ok (4.913s) 2023-01-11T22:18:50.4892884Z test_communication_nested_model_False_use_no_sync_True_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP (__main__.TestCommunication) 2023-01-11T22:18:50.4893603Z Tests FSDP's communication cost in terms of calls to collective ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62579 2023-01-11T22:18:50.4894134Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62580 2023-01-11T22:18:50.4894734Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:50.4895165Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:50.4895732Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:50.4896192Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:50.4897284Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:50.4897727Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:50.4898312Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:50.4898772Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:50.4899226Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:18:50.4899700Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:18:50.4900345Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:18:50.4901026Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:18:50.4901523Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:50.4901988Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:18:50.4902454Z dist init r=1, world=2 2023-01-11T22:18:50.4902707Z dist init r=0, world=2 2023-01-11T22:18:50.4902929Z ok (4.813s) 2023-01-11T22:18:50.4903309Z test_communication_nested_model_True_use_no_sync_False_sharding_strategy_None (__main__.TestCommunication) 2023-01-11T22:18:50.4904010Z Tests FSDP's communication cost in terms of calls to collective ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62662 2023-01-11T22:18:50.4904519Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62663 2023-01-11T22:18:50.4905118Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:50.4905560Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:50.4906128Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:50.4906573Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:50.4907150Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:50.4907660Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:50.4908228Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:50.4908693Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:50.4909139Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:18:50.4909631Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:18:50.4910263Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:18:50.4910943Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:18:50.4911459Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:50.4911929Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:18:50.4913173Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:18:50.4913959Z warnings.warn( 2023-01-11T22:18:50.4915110Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:18:50.4915881Z warnings.warn( 2023-01-11T22:18:50.4916131Z dist init r=0, world=2 2023-01-11T22:18:50.4916361Z dist init r=1, world=2 2023-01-11T22:18:50.4916599Z ok (4.512s) 2023-01-11T22:18:50.4917014Z test_communication_nested_model_True_use_no_sync_False_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP (__main__.TestCommunication) 2023-01-11T22:18:50.4917728Z Tests FSDP's communication cost in terms of calls to collective ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62745 2023-01-11T22:18:50.4918255Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62746 2023-01-11T22:18:50.4918855Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:50.4919370Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:50.4919928Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:50.4920394Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:50.4920966Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:50.4921388Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:50.4921950Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:50.4922404Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:50.4922845Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:18:50.4923319Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:18:50.4924016Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:18:50.4924762Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:18:50.4925285Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:50.4925733Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:18:50.4926986Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:18:50.4927764Z warnings.warn( 2023-01-11T22:18:50.4928914Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:18:50.4929676Z warnings.warn( 2023-01-11T22:18:50.4929906Z dist init r=0, world=2 2023-01-11T22:18:50.4930152Z dist init r=1, world=2 2023-01-11T22:18:50.4930386Z ok (4.513s) 2023-01-11T22:18:50.4930743Z test_communication_nested_model_True_use_no_sync_True_sharding_strategy_None (__main__.TestCommunication) 2023-01-11T22:18:50.4931439Z Tests FSDP's communication cost in terms of calls to collective ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62828 2023-01-11T22:18:50.4931971Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62829 2023-01-11T22:18:50.4932571Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:50.4932997Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:50.4933565Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:50.4934026Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:50.4934581Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:50.4935019Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:50.4935657Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:50.4936119Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:50.4936993Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:18:50.4937541Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:18:50.4938207Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:18:50.4938885Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:18:50.4939387Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:18:50.4939854Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:50.4941204Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:18:50.4941987Z warnings.warn( 2023-01-11T22:18:50.4943120Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:18:50.4943894Z warnings.warn( 2023-01-11T22:18:50.4944141Z dist init r=0, world=2 2023-01-11T22:18:50.4944386Z dist init r=1, world=2 2023-01-11T22:18:50.4944604Z ok (4.513s) 2023-01-11T22:18:50.4945016Z test_communication_nested_model_True_use_no_sync_True_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP (__main__.TestCommunication) 2023-01-11T22:18:50.4945746Z Tests FSDP's communication cost in terms of calls to collective ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62911 2023-01-11T22:18:50.4946273Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62912 2023-01-11T22:18:50.4946854Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:50.4947302Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:50.4947869Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:50.4948323Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:50.4948899Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:18:50.4949340Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:50.4949906Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:50.4950346Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:50.4950792Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:18:50.4951277Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:18:50.4951926Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:18:50.4952686Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:18:50.4953205Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:18:50.4953673Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:50.4954935Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:18:50.4955691Z warnings.warn( 2023-01-11T22:18:50.4956877Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:18:50.4957657Z warnings.warn( 2023-01-11T22:18:50.4957903Z dist init r=1, world=2 2023-01-11T22:18:50.4958131Z dist init r=0, world=2 2023-01-11T22:18:50.4958365Z ok (4.412s) 2023-01-11T22:18:50.4958511Z 2023-01-11T22:18:50.4958780Z ---------------------------------------------------------------------- 2023-01-11T22:18:50.4959104Z Ran 8 tests in 38.905s 2023-01-11T22:18:50.4959247Z 2023-01-11T22:18:50.4959339Z OK 2023-01-11T22:18:50.4959475Z 2023-01-11T22:18:50.4959600Z Generating XML reports... 2023-01-11T22:18:50.4960184Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_comm/TEST-TestCommunication-20230111221811.xml 2023-01-11T22:18:50.4960531Z 2023-01-11T22:18:50.4960858Z ##[endgroup] 2023-01-11T22:18:50.4961445Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_comm (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_comm_q26zwve0) 2023-01-11T22:18:50.4961793Z 2023-01-11T22:18:50.4962083Z Running distributed/fsdp/test_fsdp_freezing_weights ... [2023-01-11 22:18:50.485446] 2023-01-11T22:18:50.4962769Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_freezing_weights.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:18:50.485737] 2023-01-11T22:19:38.1630666Z 2023-01-11T22:19:38.1631741Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_freezing_weights 2023-01-11T22:19:38.1633329Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_freezing_weights (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_freezing_weights_bbzusvze) 2023-01-11T22:19:38.1634088Z 2023-01-11T22:19:38.1634292Z Running tests... 2023-01-11T22:19:38.1635119Z ---------------------------------------------------------------------- 2023-01-11T22:19:38.1637676Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_freezing_weights 2023-01-11T22:19:38.1638542Z test_freezing_weights_with_nested_trunk_False_freezing_method_FreezingMethod_GradToNone_freeze_after_wrap_fsdp_False (__main__.TestFreezingWeights) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:19:38.1639319Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63029 2023-01-11T22:19:38.1639772Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63030 2023-01-11T22:19:38.1640418Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:19:38.1641097Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:38.1641727Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:38.1642460Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:38.1643386Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:19:38.1643833Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:38.1644410Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:38.1644879Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:38.1645317Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:19:38.1645808Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:19:38.1646469Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:38.1647160Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:38.1647794Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:19:38.1648288Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:19:38.1648769Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:38.1649248Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:38.1649593Z dist init r=0, world=2 2023-01-11T22:19:38.1649869Z dist init r=1, world=2 2023-01-11T22:19:38.1650108Z ok (7.056s) 2023-01-11T22:19:38.1650652Z test_freezing_weights_with_nested_trunk_False_freezing_method_FreezingMethod_GradToNone_freeze_after_wrap_fsdp_True (__main__.TestFreezingWeights) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63112 2023-01-11T22:19:38.1651389Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63113 2023-01-11T22:19:38.1652019Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:19:38.1652479Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:38.1653042Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:38.1653491Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:38.1654062Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:19:38.1654505Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:38.1655382Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:38.1655866Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:38.1656320Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:19:38.1657321Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:19:38.1657994Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:38.1658681Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:38.1659181Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:19:38.1659644Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:19:38.1660113Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:38.1660738Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:38.1662000Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:38.1662782Z warnings.warn( 2023-01-11T22:19:38.1663916Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:38.1664686Z warnings.warn( 2023-01-11T22:19:38.1665025Z dist init r=1, world=2 2023-01-11T22:19:38.1665273Z dist init r=0, world=2 2023-01-11T22:19:38.1665508Z ok (5.514s) 2023-01-11T22:19:38.1666071Z test_freezing_weights_with_nested_trunk_False_freezing_method_FreezingMethod_RequiresGrad_freeze_after_wrap_fsdp_False (__main__.TestFreezingWeights) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63195 2023-01-11T22:19:38.1666706Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63196 2023-01-11T22:19:38.1667318Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:19:38.1667762Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:38.1668329Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:38.1668784Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:38.1669358Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:19:38.1669796Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:38.1670366Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:38.1670810Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:38.1671262Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:19:38.1671750Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:19:38.1672379Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:38.1673067Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:38.1673587Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:19:38.1674055Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:19:38.1674511Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:38.1674989Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:38.1675348Z dist init r=1, world=2 2023-01-11T22:19:38.1675583Z dist init r=0, world=2 2023-01-11T22:19:38.1675816Z ok (5.414s) 2023-01-11T22:19:38.1676376Z test_freezing_weights_with_nested_trunk_False_freezing_method_FreezingMethod_RequiresGrad_freeze_after_wrap_fsdp_True (__main__.TestFreezingWeights) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63278 2023-01-11T22:19:38.1677107Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63279 2023-01-11T22:19:38.1677705Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:19:38.1678148Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:38.1678714Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:38.1679179Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:38.1679732Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:19:38.1680171Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:38.1680732Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:38.1681177Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:38.1681685Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:19:38.1682189Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:19:38.1682842Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:38.1683510Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:38.1684026Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:19:38.1684494Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:19:38.1684947Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:38.1685429Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:38.1686695Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:38.1687471Z warnings.warn( 2023-01-11T22:19:38.1688610Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:38.1689375Z warnings.warn( 2023-01-11T22:19:38.1689610Z dist init r=1, world=2 2023-01-11T22:19:38.1689860Z dist init r=0, world=2 2023-01-11T22:19:38.1690097Z ok (5.414s) 2023-01-11T22:19:38.1690636Z test_freezing_weights_with_nested_trunk_True_freezing_method_FreezingMethod_GradToNone_freeze_after_wrap_fsdp_False (__main__.TestFreezingWeights) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63361 2023-01-11T22:19:38.1691275Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63362 2023-01-11T22:19:38.1691879Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:19:38.1692321Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:38.1692871Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:38.1693405Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:38.1693985Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:19:38.1694407Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:38.1694973Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:38.1695432Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:38.1695883Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:19:38.1696356Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:19:38.1697297Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:38.1697987Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:38.1698645Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:19:38.1699113Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:19:38.1699579Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:38.1700056Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:38.1700395Z dist init r=0, world=2 2023-01-11T22:19:38.1700651Z dist init r=1, world=2 2023-01-11T22:19:38.1700890Z ok (5.513s) 2023-01-11T22:19:38.1701427Z test_freezing_weights_with_nested_trunk_True_freezing_method_FreezingMethod_GradToNone_freeze_after_wrap_fsdp_True (__main__.TestFreezingWeights) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63444 2023-01-11T22:19:38.1702072Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63445 2023-01-11T22:19:38.1702685Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:19:38.1703129Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:38.1703680Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:38.1704143Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:38.1704714Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:19:38.1705154Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:38.1705699Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:38.1706163Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:38.1706612Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:19:38.1707085Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:19:38.1707734Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:38.1708411Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:38.1708929Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:19:38.1709377Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:19:38.1709845Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:38.1710424Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:38.1711697Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:38.1712459Z warnings.warn( 2023-01-11T22:19:38.1713587Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:38.1714353Z warnings.warn( 2023-01-11T22:19:38.1714660Z dist init r=1, world=2 2023-01-11T22:19:38.1714919Z dist init r=0, world=2 2023-01-11T22:19:38.1715137Z ok (5.513s) 2023-01-11T22:19:38.1715695Z test_freezing_weights_with_nested_trunk_True_freezing_method_FreezingMethod_RequiresGrad_freeze_after_wrap_fsdp_False (__main__.TestFreezingWeights) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63557 2023-01-11T22:19:38.1716342Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63558 2023-01-11T22:19:38.1716932Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:19:38.1717374Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:38.1717940Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:38.1718406Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:38.1718961Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:19:38.1719407Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:38.1719973Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:38.1720429Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:38.1720860Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:19:38.1721349Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:19:38.1721997Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:38.1722665Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:38.1723183Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:19:38.1723648Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:19:38.1724121Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:38.1724578Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:38.1724933Z dist init r=0, world=2 2023-01-11T22:19:38.1725185Z dist init r=1, world=2 2023-01-11T22:19:38.1725405Z ok (5.513s) 2023-01-11T22:19:38.1725961Z test_freezing_weights_with_nested_trunk_True_freezing_method_FreezingMethod_RequiresGrad_freeze_after_wrap_fsdp_True (__main__.TestFreezingWeights) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63640 2023-01-11T22:19:38.1726754Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63641 2023-01-11T22:19:38.1727366Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:19:38.1727794Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:38.1728361Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:38.1728824Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:38.1729376Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:19:38.1729813Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:38.1730379Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:38.1730841Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:38.1731332Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:19:38.1731832Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:19:38.1732482Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:38.1733156Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:38.1733651Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:19:38.1734116Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:19:38.1734592Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:38.1735057Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:38.1736324Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:38.1737460Z warnings.warn( 2023-01-11T22:19:38.1738622Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:38.1739393Z warnings.warn( 2023-01-11T22:19:38.1739647Z dist init r=0, world=2 2023-01-11T22:19:38.1739880Z dist init r=1, world=2 2023-01-11T22:19:38.1740118Z ok (5.413s) 2023-01-11T22:19:38.1740265Z 2023-01-11T22:19:38.1740537Z ---------------------------------------------------------------------- 2023-01-11T22:19:38.1740849Z Ran 8 tests in 45.351s 2023-01-11T22:19:38.1741006Z 2023-01-11T22:19:38.1741100Z OK 2023-01-11T22:19:38.1741232Z 2023-01-11T22:19:38.1741355Z Generating XML reports... 2023-01-11T22:19:38.1741959Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_freezing_weights/TEST-TestFreezingWeights-20230111221852.xml 2023-01-11T22:19:38.1742331Z 2023-01-11T22:19:38.1742730Z ##[endgroup] 2023-01-11T22:19:38.1743365Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_freezing_weights (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_freezing_weights_bbzusvze) 2023-01-11T22:19:38.1743859Z 2023-01-11T22:19:38.1744137Z Running distributed/fsdp/test_fsdp_grad_acc ... [2023-01-11 22:19:38.163182] 2023-01-11T22:19:38.1744805Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_grad_acc.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:19:38.163527] 2023-01-11T22:20:45.3211005Z 2023-01-11T22:20:45.3211530Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_grad_acc 2023-01-11T22:20:45.3214136Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_grad_acc (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_grad_acc_nii6a7yt) 2023-01-11T22:20:45.3214581Z 2023-01-11T22:20:45.3214680Z Running tests... 2023-01-11T22:20:45.3217420Z ---------------------------------------------------------------------- 2023-01-11T22:20:45.3218027Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_grad_acc 2023-01-11T22:20:45.3218945Z test_grad_acc_configs_[(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3)]_sharding_strategy_ShardingStrategy_FULL_SHARD_use_orig_params_False (__main__.TestGradAcc) 2023-01-11T22:20:45.3219863Z Tests gradient accumulation. ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:20:45.3220324Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63788 2023-01-11T22:20:45.3220753Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63789 2023-01-11T22:20:45.3221394Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:20:45.3221848Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:45.3222441Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:45.3222901Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:45.3223485Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:20:45.3223940Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:45.3224500Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:45.3224969Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:45.3225427Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:20:45.3225927Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:20:45.3226582Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:45.3227273Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:45.3227796Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:20:45.3228274Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:20:45.3229349Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3230683Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3232086Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3233323Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3234544Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3235837Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3237063Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3238284Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3238891Z dist init r=1, world=2 2023-01-11T22:20:45.3239126Z dist init r=0, world=2 2023-01-11T22:20:45.3239363Z ok (6.772s) 2023-01-11T22:20:45.3239865Z test_grad_acc_configs_[(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3)]_sharding_strategy_ShardingStrategy_FULL_SHARD_use_orig_params_True (__main__.TestGradAcc) 2023-01-11T22:20:45.3240500Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63871 2023-01-11T22:20:45.3240975Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63872 2023-01-11T22:20:45.3241581Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:20:45.3242036Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:45.3242590Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:45.3243061Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:45.3243641Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:20:45.3244087Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:45.3244644Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:45.3245260Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:45.3245713Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:20:45.3246208Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:20:45.3246848Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:45.3247535Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:45.3248131Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:20:45.3248601Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:20:45.3248936Z dist init r=1, world=2 2023-01-11T22:20:45.3249189Z dist init r=0, world=2 2023-01-11T22:20:45.3249428Z ok (5.413s) 2023-01-11T22:20:45.3249904Z test_grad_acc_configs_[(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3)]_sharding_strategy_ShardingStrategy_NO_SHARD_use_orig_params_False (__main__.TestGradAcc) 2023-01-11T22:20:45.3250528Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63954 2023-01-11T22:20:45.3251019Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63955 2023-01-11T22:20:45.3251626Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:20:45.3252059Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:45.3252684Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:45.3253163Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:45.3253724Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:20:45.3254174Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:45.3254748Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:45.3255210Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:45.3255642Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:20:45.3256142Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:20:45.3257088Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:45.3257777Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:45.3258295Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:20:45.3258765Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:20:45.3259755Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3261004Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3262206Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3263436Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3264768Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3265980Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3267202Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3268478Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3269106Z dist init r=1, world=2 2023-01-11T22:20:45.3269340Z dist init r=0, world=2 2023-01-11T22:20:45.3269579Z ok (5.013s) 2023-01-11T22:20:45.3270076Z test_grad_acc_configs_[(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3)]_sharding_strategy_ShardingStrategy_NO_SHARD_use_orig_params_True (__main__.TestGradAcc) 2023-01-11T22:20:45.3270704Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64037 2023-01-11T22:20:45.3271182Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64038 2023-01-11T22:20:45.3271801Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:20:45.3272250Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:45.3272808Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:45.3273277Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:45.3273852Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:20:45.3274298Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:45.3274848Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:45.3275310Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:45.3275762Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:20:45.3276260Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:20:45.3276904Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:45.3277587Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:45.3278103Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:20:45.3278552Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:20:45.3279540Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3280863Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3282078Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3283301Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3284560Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3285786Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3286992Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3288191Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3289399Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3290599Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3291811Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3293010Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3294224Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3295490Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3296963Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3298166Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3299481Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3300704Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3301903Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3303117Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3303712Z dist init r=0, world=2 2023-01-11T22:20:45.3303945Z dist init r=1, world=2 2023-01-11T22:20:45.3304184Z ok (5.413s) 2023-01-11T22:20:45.3304685Z test_grad_acc_configs_[(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3)]_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP_use_orig_params_False (__main__.TestGradAcc) 2023-01-11T22:20:45.3305319Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64120 2023-01-11T22:20:45.3305794Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64121 2023-01-11T22:20:45.3306405Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:20:45.3306855Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:45.3307430Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:45.3307882Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:45.3308458Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:20:45.3308900Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:45.3309449Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:45.3309911Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:45.3310363Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:20:45.3310947Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:20:45.3311584Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:45.3312269Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:45.3312791Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:20:45.3313261Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:20:45.3314235Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3315527Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3316766Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3317990Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3319213Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3320426Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3321640Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3322849Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3323437Z dist init r=0, world=2 2023-01-11T22:20:45.3323690Z dist init r=1, world=2 2023-01-11T22:20:45.3323931Z ok (5.113s) 2023-01-11T22:20:45.3324414Z test_grad_acc_configs_[(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3)]_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP_use_orig_params_True (__main__.TestGradAcc) 2023-01-11T22:20:45.3325054Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64203 2023-01-11T22:20:45.3325545Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64204 2023-01-11T22:20:45.3326291Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:20:45.3326727Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:45.3327299Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:45.3327770Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:45.3328347Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:20:45.3328824Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:45.3329402Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:45.3329866Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:45.3330302Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:20:45.3330800Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:20:45.3331510Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:45.3332210Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:45.3332710Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:20:45.3333178Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:20:45.3333530Z dist init r=1, world=2 2023-01-11T22:20:45.3333760Z dist init r=0, world=2 2023-01-11T22:20:45.3333999Z ok (5.413s) 2023-01-11T22:20:45.3334498Z test_grad_acc_configs_[(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3)]_sharding_strategy_ShardingStrategy_FULL_SHARD_use_orig_params_False (__main__.TestGradAcc) 2023-01-11T22:20:45.3335132Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64286 2023-01-11T22:20:45.3335606Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64287 2023-01-11T22:20:45.3336214Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:20:45.3336927Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:45.3337520Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:45.3337974Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:45.3338546Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:20:45.3338993Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:45.3339543Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:45.3340006Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:45.3340457Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:20:45.3340950Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:20:45.3341588Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:45.3342272Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:45.3342784Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:20:45.3343354Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:20:45.3344337Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3345572Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3346796Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3348079Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3349318Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3350534Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3351759Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3352945Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3353547Z dist init r=1, world=2 2023-01-11T22:20:45.3353799Z dist init r=0, world=2 2023-01-11T22:20:45.3354037Z ok (5.113s) 2023-01-11T22:20:45.3354513Z test_grad_acc_configs_[(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3)]_sharding_strategy_ShardingStrategy_FULL_SHARD_use_orig_params_True (__main__.TestGradAcc) 2023-01-11T22:20:45.3355146Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64369 2023-01-11T22:20:45.3355635Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64370 2023-01-11T22:20:45.3356246Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:20:45.3356677Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:45.3357250Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:45.3357714Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:45.3358270Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:20:45.3358789Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:45.3359365Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:45.3359823Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:45.3360255Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:20:45.3360753Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:20:45.3361410Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:45.3362094Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:45.3362593Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:20:45.3363063Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:20:45.3363421Z dist init r=1, world=2 2023-01-11T22:20:45.3363654Z dist init r=0, world=2 2023-01-11T22:20:45.3363945Z ok (5.413s) 2023-01-11T22:20:45.3364450Z test_grad_acc_configs_[(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3)]_sharding_strategy_ShardingStrategy_NO_SHARD_use_orig_params_False (__main__.TestGradAcc) 2023-01-11T22:20:45.3365078Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64452 2023-01-11T22:20:45.3365550Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64453 2023-01-11T22:20:45.3366158Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:20:45.3366609Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:45.3367164Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:45.3367633Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:45.3368209Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:20:45.3368648Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:45.3369195Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:45.3369655Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:45.3370105Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:20:45.3370601Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:20:45.3371239Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:45.3371930Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:45.3372446Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:20:45.3372895Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:20:45.3373886Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3375117Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3376423Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3377994Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3379223Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3380523Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3381765Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3382960Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3383569Z dist init r=1, world=2 2023-01-11T22:20:45.3383827Z dist init r=0, world=2 2023-01-11T22:20:45.3384068Z ok (5.115s) 2023-01-11T22:20:45.3384546Z test_grad_acc_configs_[(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3)]_sharding_strategy_ShardingStrategy_NO_SHARD_use_orig_params_True (__main__.TestGradAcc) 2023-01-11T22:20:45.3385172Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64535 2023-01-11T22:20:45.3385664Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64536 2023-01-11T22:20:45.3386269Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:20:45.3386701Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:45.3387280Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:45.3387751Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:45.3388311Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:20:45.3388756Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:45.3389323Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:45.3389785Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:45.3390218Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:20:45.3390707Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:20:45.3391358Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:45.3392119Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:45.3392644Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:20:45.3393114Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:20:45.3394107Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3395337Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3396610Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3397847Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3399074Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3400290Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3401498Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3402702Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3403924Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3405125Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3406338Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3407593Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3408810Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3410018Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3411286Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3412506Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3413704Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3414918Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3416115Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3417613Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3418199Z dist init r=0, world=2 2023-01-11T22:20:45.3418455Z dist init r=1, world=2 2023-01-11T22:20:45.3418699Z ok (5.513s) 2023-01-11T22:20:45.3419184Z test_grad_acc_configs_[(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3)]_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP_use_orig_params_False (__main__.TestGradAcc) 2023-01-11T22:20:45.3419819Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64618 2023-01-11T22:20:45.3420311Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64619 2023-01-11T22:20:45.3420919Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:20:45.3421355Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:45.3422029Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:45.3422501Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:45.3423076Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:20:45.3423503Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:45.3424068Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:45.3424529Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:45.3424961Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:20:45.3425459Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:20:45.3426111Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:45.3426861Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:45.3427379Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:20:45.3427847Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:20:45.3428890Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3430123Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3431350Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3432550Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3433770Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3434985Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3436193Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3437407Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:20:45.3438080Z dist init r=1, world=2 2023-01-11T22:20:45.3438335Z dist init r=0, world=2 2023-01-11T22:20:45.3438557Z ok (5.113s) 2023-01-11T22:20:45.3439058Z test_grad_acc_configs_[(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3)]_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP_use_orig_params_True (__main__.TestGradAcc) 2023-01-11T22:20:45.3439689Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64701 2023-01-11T22:20:45.3440164Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64702 2023-01-11T22:20:45.3440775Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:20:45.3441228Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:45.3441803Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:45.3442314Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:45.3442907Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:20:45.3443353Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:45.3443903Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:45.3444368Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:45.3444815Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:20:45.3445309Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:20:45.3445948Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:45.3446640Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:45.3447157Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:20:45.3447624Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:20:45.3447960Z dist init r=1, world=2 2023-01-11T22:20:45.3448212Z dist init r=0, world=2 2023-01-11T22:20:45.3448452Z ok (5.414s) 2023-01-11T22:20:45.3448584Z 2023-01-11T22:20:45.3448856Z ---------------------------------------------------------------------- 2023-01-11T22:20:45.3449184Z Ran 12 tests in 64.819s 2023-01-11T22:20:45.3449346Z 2023-01-11T22:20:45.3449439Z OK 2023-01-11T22:20:45.3449573Z 2023-01-11T22:20:45.3449681Z Generating XML reports... 2023-01-11T22:20:45.3450258Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_grad_acc/TEST-TestGradAcc-20230111221940.xml 2023-01-11T22:20:45.3450590Z 2023-01-11T22:20:45.3450978Z ##[endgroup] 2023-01-11T22:20:45.3451582Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_grad_acc (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_grad_acc_nii6a7yt) 2023-01-11T22:20:45.3451921Z 2023-01-11T22:20:45.3452184Z Running distributed/fsdp/test_fsdp_misc ... [2023-01-11 22:20:45.321480] 2023-01-11T22:20:45.3452848Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_misc.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:20:45.321777] 2023-01-11T22:22:03.3238302Z 2023-01-11T22:22:03.3238799Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_misc 2023-01-11T22:22:03.3242429Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_misc (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_misc_7s34916l) 2023-01-11T22:22:03.3243050Z 2023-01-11T22:22:03.3243168Z Running tests... 2023-01-11T22:22:03.3243682Z ---------------------------------------------------------------------- 2023-01-11T22:22:03.3244238Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_misc 2023-01-11T22:22:03.3244661Z test_cpu_init_with_sync_module_states (__main__.TestFSDPMisc) 2023-01-11T22:22:03.3247003Z Tests that passing ``sync_module_states=True`` raises an error for ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:22:03.3247552Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64819 2023-01-11T22:22:03.3248009Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64820 2023-01-11T22:22:03.3248690Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3249147Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3249718Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3252265Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3253143Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3253642Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3254237Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3254698Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3255160Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:22:03.3255663Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:22:03.3256357Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3257636Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3258358Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:22:03.3258971Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:22:03.3260258Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:22:03.3261106Z warnings.warn( 2023-01-11T22:22:03.3262268Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:22:03.3263051Z warnings.warn( 2023-01-11T22:22:03.3263307Z dist init r=0, world=2 2023-01-11T22:22:03.3263539Z dist init r=1, world=2 2023-01-11T22:22:03.3263775Z ok (5.504s) 2023-01-11T22:22:03.3264068Z test_device_id_auto_wrap (__main__.TestFSDPMisc) 2023-01-11T22:22:03.3264541Z Tests that ``auto_wrap_policy`` propagates ``device_id`` to all ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64898 2023-01-11T22:22:03.3265073Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64899 2023-01-11T22:22:03.3265861Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3266315Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3266871Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3267339Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3267910Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3268351Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3268898Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3269353Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3269809Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:22:03.3270369Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:22:03.3271049Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3271734Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3272253Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:22:03.3272705Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:22:03.3273054Z dist init r=1, world=2 2023-01-11T22:22:03.3273306Z dist init r=0, world=2 2023-01-11T22:22:03.3273526Z ok (4.010s) 2023-01-11T22:22:03.3273824Z test_fsdp_cpu_init_stays_on_cpu (__main__.TestFSDPMisc) 2023-01-11T22:22:03.3274337Z Tests that passing a CPU module to FSDP preserves that the wrapped ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64977 2023-01-11T22:22:03.3274877Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64978 2023-01-11T22:22:03.3275471Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3275919Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3276494Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3276943Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3277513Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3277952Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3278522Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3278968Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3279421Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:22:03.3279911Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:22:03.3280562Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3281224Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3281737Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:22:03.3282198Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:22:03.3282609Z dist init r=0, world=2 2023-01-11T22:22:03.3282859Z dist init r=1, world=2 2023-01-11T22:22:03.3283098Z ok (4.412s) 2023-01-11T22:22:03.3283384Z test_fsdp_device_id_cpu_offload (__main__.TestFSDPMisc) 2023-01-11T22:22:03.3283873Z Ensures that even if device_id is specified but we have ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65060 2023-01-11T22:22:03.3284388Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65061 2023-01-11T22:22:03.3284994Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3285421Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3285988Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3286450Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3287029Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3287508Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3288087Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3288545Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3288974Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:22:03.3289462Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:22:03.3290110Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3290792Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3291294Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:22:03.3291763Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:22:03.3292116Z dist init r=1, world=2 2023-01-11T22:22:03.3292348Z dist init r=0, world=2 2023-01-11T22:22:03.3292586Z ok (3.911s) 2023-01-11T22:22:03.3292890Z test_fsdp_device_id_use_index_False (__main__.TestFSDPMisc) 2023-01-11T22:22:03.3293342Z Tests the FSDP ``device_id`` argument: ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65139 2023-01-11T22:22:03.3293840Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65140 2023-01-11T22:22:03.3294452Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3294899Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3295453Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3295914Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3296490Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3297342Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3297903Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3298361Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3298809Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:22:03.3299282Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:22:03.3299937Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3300739Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3301256Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:22:03.3301702Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:22:03.3302047Z dist init r=1, world=2 2023-01-11T22:22:03.3302298Z dist init r=0, world=2 2023-01-11T22:22:03.3302518Z ok (3.911s) 2023-01-11T22:22:03.3302820Z test_fsdp_device_id_use_index_True (__main__.TestFSDPMisc) 2023-01-11T22:22:03.3303285Z Tests the FSDP ``device_id`` argument: ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65218 2023-01-11T22:22:03.3303776Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65219 2023-01-11T22:22:03.3304367Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3304809Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3305449Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3305915Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3306495Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3306935Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3307498Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3307940Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3308390Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:22:03.3308887Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:22:03.3309543Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3310207Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3310726Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:22:03.3311192Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:22:03.3311526Z dist init r=0, world=2 2023-01-11T22:22:03.3311780Z dist init r=1, world=2 2023-01-11T22:22:03.3312018Z ok (4.011s) 2023-01-11T22:22:03.3312484Z test_fsdp_module_no_compute_grad_use_second_layer_False_sharding_strategy_None (__main__.TestFSDPMisc) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65297 2023-01-11T22:22:03.3313066Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65298 2023-01-11T22:22:03.3313675Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3314126Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3314674Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3315139Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3315712Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3316153Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3316699Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3317236Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3317687Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:22:03.3318158Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:22:03.3318812Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3319493Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3320005Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:22:03.3320453Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:22:03.3320798Z dist init r=0, world=2 2023-01-11T22:22:03.3321048Z dist init r=1, world=2 2023-01-11T22:22:03.3321274Z ok (4.412s) 2023-01-11T22:22:03.3321878Z test_fsdp_module_no_compute_grad_use_second_layer_False_sharding_strategy_ShardingStrategy_NO_SHARD (__main__.TestFSDPMisc) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65380 2023-01-11T22:22:03.3322493Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65381 2023-01-11T22:22:03.3323101Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3323528Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3324098Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3324561Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3325129Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3325554Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3326124Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3326679Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3327130Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:22:03.3327621Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:22:03.3328254Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3328937Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3329451Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:22:03.3329919Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:22:03.3330252Z dist init r=1, world=2 2023-01-11T22:22:03.3330503Z dist init r=0, world=2 2023-01-11T22:22:03.3330744Z ok (4.412s) 2023-01-11T22:22:03.3331212Z test_fsdp_module_no_compute_grad_use_second_layer_True_sharding_strategy_None (__main__.TestFSDPMisc) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65463 2023-01-11T22:22:03.3331844Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65464 2023-01-11T22:22:03.3332457Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3332900Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3333455Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3333991Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3334564Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3335004Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3335553Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3336010Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3336456Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:22:03.3337320Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:22:03.3337978Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3338662Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3339179Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:22:03.3339712Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:22:03.3340076Z dist init r=0, world=2 2023-01-11T22:22:03.3340326Z dist init r=1, world=2 2023-01-11T22:22:03.3340546Z ok (4.512s) 2023-01-11T22:22:03.3341061Z test_fsdp_module_no_compute_grad_use_second_layer_True_sharding_strategy_ShardingStrategy_NO_SHARD (__main__.TestFSDPMisc) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65546 2023-01-11T22:22:03.3341658Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65547 2023-01-11T22:22:03.3342271Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3342705Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3343281Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3343751Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3344307Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3344750Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3345319Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3345775Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3346201Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:22:03.3346687Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:22:03.3347345Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3348030Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3348529Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:22:03.3348993Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:22:03.3349344Z dist init r=0, world=2 2023-01-11T22:22:03.3349577Z dist init r=1, world=2 2023-01-11T22:22:03.3349814Z ok (4.512s) 2023-01-11T22:22:03.3350229Z test_fsdp_namedtuple (__main__.TestFSDPMisc) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65629 2023-01-11T22:22:03.3350738Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65630 2023-01-11T22:22:03.3351333Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3351865Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3352442Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3352892Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3353462Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3353902Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3354467Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3354912Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3355357Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:22:03.3355852Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:22:03.3356538Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3357234Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3357748Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:22:03.3358216Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:22:03.3358551Z dist init r=0, world=2 2023-01-11T22:22:03.3358800Z dist init r=1, world=2 2023-01-11T22:22:03.3359040Z ok (3.911s) 2023-01-11T22:22:03.3359451Z test_fsdp_not_all_outputs_used_in_loss (__main__.TestFSDPMisc) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65708 2023-01-11T22:22:03.3359980Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65709 2023-01-11T22:22:03.3360591Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3361034Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3361584Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3362045Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3362618Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3363057Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3363606Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3364064Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3364508Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:22:03.3364983Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:22:03.3365629Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3366308Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3366820Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:22:03.3367271Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:22:03.3368079Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_misc.py:113: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:22:03.3368875Z self.assertEqual(full_param.storage().size(), 0) 2023-01-11T22:22:03.3369623Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_misc.py:113: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:22:03.3370325Z self.assertEqual(full_param.storage().size(), 0) 2023-01-11T22:22:03.3370604Z dist init r=0, world=2 2023-01-11T22:22:03.3370854Z dist init r=1, world=2 2023-01-11T22:22:03.3371094Z ok (4.613s) 2023-01-11T22:22:03.3371379Z test_fsdp_same_model_across_ranks (__main__.TestFSDPMisc) 2023-01-11T22:22:03.3371889Z FSDP broadcasts model from rank 0 to ensure it starts off with the same ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65791 2023-01-11T22:22:03.3372523Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65792 2023-01-11T22:22:03.3373136Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3373581Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3374151Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3374618Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3375173Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3375613Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3376181Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3376982Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3377426Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:22:03.3377915Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:22:03.3378575Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3379239Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3379754Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:22:03.3380221Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:22:03.3380577Z dist init r=0, world=2 2023-01-11T22:22:03.3380807Z dist init r=1, world=2 2023-01-11T22:22:03.3381045Z ok (3.911s) 2023-01-11T22:22:03.3381345Z test_homogeneous_attributes (__main__.TestFSDPMisc) 2023-01-11T22:22:03.3381836Z Tests that passing heterogeneous values for attributes designated as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65870 2023-01-11T22:22:03.3382374Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65871 2023-01-11T22:22:03.3382979Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3383424Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3383980Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3384444Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3385130Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3385559Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3386127Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3386588Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3387039Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:22:03.3387511Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:22:03.3388156Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3388838Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3389359Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:22:03.3389880Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:22:03.3390240Z dist init r=0, world=2 2023-01-11T22:22:03.3390489Z dist init r=1, world=2 2023-01-11T22:22:03.3390708Z ok (4.011s) 2023-01-11T22:22:03.3391019Z test_module_device_mismatches_device_id (__main__.TestFSDPMisc) 2023-01-11T22:22:03.3391528Z Tests that specifying a ``device_id`` argument to FSDP for a GPU ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65949 2023-01-11T22:22:03.3392035Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65950 2023-01-11T22:22:03.3392649Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3393097Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3393671Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3394121Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3394693Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3395133Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3395698Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3396140Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3396585Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:22:03.3397075Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:22:03.3397714Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3398402Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3398916Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:22:03.3399381Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:22:03.3399719Z dist init r=0, world=2 2023-01-11T22:22:03.3399969Z dist init r=1, world=2 2023-01-11T22:22:03.3400209Z ok (3.910s) 2023-01-11T22:22:03.3400491Z test_multi_device_not_supported (__main__.TestFSDPMisc) 2023-01-11T22:22:03.3401125Z Tests that wrapping a multi-device module (i.e. with submodules on ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66028 2023-01-11T22:22:03.3401663Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66029 2023-01-11T22:22:03.3402336Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3402767Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3403336Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3403795Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3404346Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3404784Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3405351Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3405808Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3406241Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:22:03.3406948Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3407494Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:22:03.3408141Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3408637Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:22:03.3409105Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:22:03.3409457Z dist init r=1, world=2 2023-01-11T22:22:03.3409688Z dist init r=0, world=2 2023-01-11T22:22:03.3409928Z ok (4.010s) 2023-01-11T22:22:03.3410199Z test_no_params (__main__.TestFSDPMisc) 2023-01-11T22:22:03.3410657Z Test that device_id and cpu init work if module has no params ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66107 2023-01-11T22:22:03.3411181Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66108 2023-01-11T22:22:03.3411785Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3412230Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3412782Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3413245Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3413816Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3414236Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3414808Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3415265Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3415719Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:22:03.3416192Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:22:03.3417191Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3417886Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:22:03.3418404Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:22:03.3418851Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:22:03.3419294Z dist init r=1, world=2 2023-01-11T22:22:03.3419545Z dist init r=0, world=2 2023-01-11T22:22:03.3419766Z ok (3.910s) 2023-01-11T22:22:03.3420115Z test_world_size_1_sharding_strategy_warning (__main__.TestFSDPMiscWorldSize1) 2023-01-11T22:22:03.3420660Z Tests that FSDP issues a warning when it switches to using ``NO_SHARD`` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66186 2023-01-11T22:22:03.3421351Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:22:03.3421780Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:22:03.3422349Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:22:03.3422815Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:22:03.3423244Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:22:03.3423900Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:22:03.3424489Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:22:03.3424857Z dist init r=0, world=1 2023-01-11T22:22:03.3425082Z ok (3.809s) 2023-01-11T22:22:03.3425231Z 2023-01-11T22:22:03.3425504Z ---------------------------------------------------------------------- 2023-01-11T22:22:03.3425836Z Ran 18 tests in 75.694s 2023-01-11T22:22:03.3425997Z 2023-01-11T22:22:03.3426072Z OK 2023-01-11T22:22:03.3426206Z 2023-01-11T22:22:03.3426331Z Generating XML reports... 2023-01-11T22:22:03.3426899Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_misc/TEST-TestFSDPMisc-20230111222047.xml 2023-01-11T22:22:03.3427657Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_misc/TEST-TestFSDPMiscWorldSize1-20230111222047.xml 2023-01-11T22:22:03.3428002Z 2023-01-11T22:22:03.3428417Z ##[endgroup] 2023-01-11T22:22:03.3429012Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_misc (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_misc_7s34916l) 2023-01-11T22:22:03.3429356Z 2023-01-11T22:22:03.3429604Z Running distributed/fsdp/test_wrap ... [2023-01-11 22:22:03.324203] 2023-01-11T22:22:03.3430235Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_wrap.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:22:03.324473] 2023-01-11T22:23:45.2962068Z 2023-01-11T22:23:45.2965104Z Expand the folded group to see the log file of distributed/fsdp/test_wrap 2023-01-11T22:23:45.2966682Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_wrap (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_wrap_r4nee643) 2023-01-11T22:23:45.2967359Z 2023-01-11T22:23:45.2967840Z Running tests... 2023-01-11T22:23:45.2968746Z ---------------------------------------------------------------------- 2023-01-11T22:23:45.2969661Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_wrap 2023-01-11T22:23:45.2981123Z test_always_wrap (__main__.TestAutoWrap) 2023-01-11T22:23:45.2981566Z Test to ensure that if `always_wrap_policy` is ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:45.2982424Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T22:23:45.2982918Z warnings.warn( 2023-01-11T22:23:45.2983139Z ok (1.599s) 2023-01-11T22:23:45.2984507Z test_always_wrap_with_ignored_modules_wrap_method_WrapMethod_FSDP_CTOR (__main__.TestAutoWrap) ... /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:23:45.2985787Z warnings.warn( 2023-01-11T22:23:45.2986030Z ok (0.004s) 2023-01-11T22:23:45.2987118Z test_always_wrap_with_ignored_modules_wrap_method_WrapMethod_WRAP_API (__main__.TestAutoWrap) ... [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:23:45.2987826Z ok (0.004s) 2023-01-11T22:23:45.2988131Z test_auto_wrap_api (__main__.TestAutoWrap) 2023-01-11T22:23:45.2988532Z Test to ensure with auto wrap, we wrap child modules correctly based on the min_num_params. ... ok (0.003s) 2023-01-11T22:23:45.2988965Z test_auto_wrap_preset_exclude_wrap (__main__.TestAutoWrap) 2023-01-11T22:23:45.2989429Z Test to ensure excluded modules are not wrapped, regardless if the total param size is greater than the ... ok (0.002s) 2023-01-11T22:23:45.2990004Z test_auto_wrap_preset_exclude_wrap_include_children (__main__.TestAutoWrap) 2023-01-11T22:23:45.2991203Z Test to ensure excluded modules are not wrapped, but children are if param size is greater than ... [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:23:45.2992566Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:23:45.2993825Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:23:45.2994420Z ok (0.002s) 2023-01-11T22:23:45.2994725Z test_auto_wrap_preset_force_leaf (__main__.TestAutoWrap) 2023-01-11T22:23:45.2995240Z Test to ensure force-leaf modules are not wrapped, and children are not wrapped. The ... ok (0.003s) 2023-01-11T22:23:45.2995680Z test_auto_wrap_preset_force_leaf_custom (__main__.TestAutoWrap) 2023-01-11T22:23:45.2996140Z Test to ensure force-leaf modules are not wrapped. ... ok (0.002s) 2023-01-11T22:23:45.2996785Z test_auto_wrap_smoke_test_cuda_init_mode_CUDAInitMode_CUDA_AFTER_cpu_offload_CPUOffload(offload_params=False)_use_device_id_False (__main__.TestAutoWrap) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.2997648Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:23:45.2998695Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:23:45.2999925Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:23:45.3000518Z ok (0.454s) 2023-01-11T22:23:45.3001175Z test_auto_wrap_smoke_test_cuda_init_mode_CUDAInitMode_CUDA_AFTER_cpu_offload_CPUOffload(offload_params=False)_use_device_id_True (__main__.TestAutoWrap) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.3002006Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:23:45.3002402Z ok (0.042s) 2023-01-11T22:23:45.3002849Z test_auto_wrap_smoke_test_cuda_init_mode_CUDAInitMode_CUDA_AFTER_cpu_offload_CPUOffload(offload_params=True)_use_device_id_False (__main__.TestAutoWrap) ... ok (0.002s) 2023-01-11T22:23:45.3003481Z test_auto_wrap_smoke_test_cuda_init_mode_CUDAInitMode_CUDA_AFTER_cpu_offload_CPUOffload(offload_params=True)_use_device_id_True (__main__.TestAutoWrap) ... ok (0.002s) 2023-01-11T22:23:45.3004224Z test_auto_wrap_smoke_test_cuda_init_mode_CUDAInitMode_CUDA_BEFORE_cpu_offload_CPUOffload(offload_params=False)_use_device_id_False (__main__.TestAutoWrap) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.3005126Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:23:45.3006173Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:23:45.3007408Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:23:45.3008002Z ok (0.044s) 2023-01-11T22:23:45.3008564Z test_auto_wrap_smoke_test_cuda_init_mode_CUDAInitMode_CUDA_BEFORE_cpu_offload_CPUOffload(offload_params=False)_use_device_id_True (__main__.TestAutoWrap) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.3009400Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:23:45.3010049Z ok (0.043s) 2023-01-11T22:23:45.3010625Z test_auto_wrap_smoke_test_cuda_init_mode_CUDAInitMode_CUDA_BEFORE_cpu_offload_CPUOffload(offload_params=True)_use_device_id_False (__main__.TestAutoWrap) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.3011449Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:23:45.3012490Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:23:45.3013727Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:23:45.3014320Z ok (0.058s) 2023-01-11T22:23:45.3014893Z test_auto_wrap_smoke_test_cuda_init_mode_CUDAInitMode_CUDA_BEFORE_cpu_offload_CPUOffload(offload_params=True)_use_device_id_True (__main__.TestAutoWrap) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.3015714Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:23:45.3016178Z ok (0.043s) 2023-01-11T22:23:45.3016903Z test_auto_wrap_with_ignored_modules_wrap_method_WrapMethod_FSDP_CTOR (__main__.TestAutoWrap) ... ok (0.003s) 2023-01-11T22:23:45.3017439Z test_auto_wrap_with_ignored_modules_wrap_method_WrapMethod_WRAP_API (__main__.TestAutoWrap) ... ok (0.003s) 2023-01-11T22:23:45.3017834Z test_module_wrap_policy (__main__.TestAutoWrap) 2023-01-11T22:23:45.3018851Z Tests the ``ModuleWrapPolicy``. ... [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:23:45.3020131Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:23:45.3021452Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:23:45.3022705Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:23:45.3023295Z ok (0.026s) 2023-01-11T22:23:45.3023586Z test_transformer_auto_wrap_policy (__main__.TestAutoWrap) 2023-01-11T22:23:45.3024610Z Tests the ``transformer_auto_wrap_policy``. ... [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:23:45.3025876Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:23:45.3027089Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:23:45.3028302Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:23:45.3029525Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:23:45.3030715Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:23:45.3031932Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:23:45.3033239Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:23:45.3034451Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:23:45.3035767Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:23:45.3036987Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:23:45.3038204Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:23:45.3038791Z ok (0.046s) 2023-01-11T22:23:45.3039103Z test_wrap_disabled_outside_context (__main__.TestAutoWrap) ... ok (0.002s) 2023-01-11T22:23:45.3040173Z test_wrap_override_defaults (__main__.TestAutoWrap) ... [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:23:45.3040833Z ok (0.002s) 2023-01-11T22:23:45.3041172Z test_wrap_wrap_method_WrapMethod_FSDP_CTOR (__main__.TestAutoWrap) ... ok (0.002s) 2023-01-11T22:23:45.3041586Z test_wrap_wrap_method_WrapMethod_WRAP_API (__main__.TestAutoWrap) ... ok (0.002s) 2023-01-11T22:23:45.3041994Z test_bn_always_wrapped_individually (__main__.TestFSDPWrap) 2023-01-11T22:23:45.3042508Z Ensures that by using _or_policy with _wrap_batchnorm_individually, even ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66293 2023-01-11T22:23:45.3043032Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66294 2023-01-11T22:23:45.3043648Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3044096Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3044669Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3045120Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3045693Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3046136Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3046704Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3047221Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3047671Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.3048167Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:23:45.3048808Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3049492Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3050013Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:23:45.3050486Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:23:45.3051784Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:23:45.3052575Z warnings.warn( 2023-01-11T22:23:45.3053721Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:23:45.3054493Z warnings.warn( 2023-01-11T22:23:45.3054744Z dist init r=0, world=2 2023-01-11T22:23:45.3054974Z dist init r=1, world=2 2023-01-11T22:23:45.3055214Z ok (4.012s) 2023-01-11T22:23:45.3055591Z test_error_already_wrapped_nested_False_cuda_init_mode_CUDAInitMode_CUDA_AFTER (__main__.TestFSDPWrap) 2023-01-11T22:23:45.3056142Z Test that an error is raised if we attempt to wrap when submodules are ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66372 2023-01-11T22:23:45.3057039Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66373 2023-01-11T22:23:45.3057673Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3058125Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3058682Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3059151Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3059724Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3060170Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3060726Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3061187Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3061635Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.3062109Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:23:45.3062762Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3063446Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3064082Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:23:45.3064532Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:23:45.3065797Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:23:45.3066563Z warnings.warn( 2023-01-11T22:23:45.3067712Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:23:45.3068572Z warnings.warn( 2023-01-11T22:23:45.3068820Z dist init r=0, world=2 2023-01-11T22:23:45.3069073Z dist init r=1, world=2 2023-01-11T22:23:45.3069312Z ok (3.910s) 2023-01-11T22:23:45.3069670Z test_error_already_wrapped_nested_False_cuda_init_mode_CUDAInitMode_CUDA_BEFORE (__main__.TestFSDPWrap) 2023-01-11T22:23:45.3070240Z Test that an error is raised if we attempt to wrap when submodules are ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66451 2023-01-11T22:23:45.3070776Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66452 2023-01-11T22:23:45.3071388Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3071818Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3072397Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3072870Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3073442Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3073864Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3074434Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3074890Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3075320Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.3075807Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:23:45.3076466Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3077153Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3077653Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:23:45.3078121Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:23:45.3078472Z dist init r=1, world=2 2023-01-11T22:23:45.3078706Z dist init r=0, world=2 2023-01-11T22:23:45.3078941Z ok (3.910s) 2023-01-11T22:23:45.3079316Z test_error_already_wrapped_nested_True_cuda_init_mode_CUDAInitMode_CUDA_AFTER (__main__.TestFSDPWrap) 2023-01-11T22:23:45.3079888Z Test that an error is raised if we attempt to wrap when submodules are ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66530 2023-01-11T22:23:45.3080477Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66531 2023-01-11T22:23:45.3081096Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3081548Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3082103Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3082567Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3083139Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3083578Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3084125Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3084589Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3085035Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:23:45.3085581Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.3086233Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3086910Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3087430Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:23:45.3087880Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:23:45.3089141Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:23:45.3089923Z warnings.warn( 2023-01-11T22:23:45.3091064Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:23:45.3091832Z warnings.warn( 2023-01-11T22:23:45.3092065Z dist init r=0, world=2 2023-01-11T22:23:45.3092315Z dist init r=1, world=2 2023-01-11T22:23:45.3092559Z ok (3.910s) 2023-01-11T22:23:45.3092931Z test_error_already_wrapped_nested_True_cuda_init_mode_CUDAInitMode_CUDA_BEFORE (__main__.TestFSDPWrap) 2023-01-11T22:23:45.3093489Z Test that an error is raised if we attempt to wrap when submodules are ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66609 2023-01-11T22:23:45.3094023Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66610 2023-01-11T22:23:45.3094633Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3095063Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3095636Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3096099Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3096999Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3097545Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3098135Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3098598Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3099031Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:23:45.3099523Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.3100170Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3100852Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3101349Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:23:45.3101818Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:23:45.3102282Z dist init r=0, world=2 2023-01-11T22:23:45.3102551Z dist init r=1, world=2 2023-01-11T22:23:45.3102771Z ok (4.010s) 2023-01-11T22:23:45.3103401Z test_main_wrap_api_cpu_offload_CPUOffload(offload_params=False)_backward_prefetch_BackwardPrefetch_BACKWARD_POST_forward_prefetch_False_cuda_init_mode_CUDAInitMode_CUDA_AFTER (__main__.TestFSDPWrap) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66688 2023-01-11T22:23:45.3104108Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66689 2023-01-11T22:23:45.3104704Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3105154Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3105733Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3106199Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3106753Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3107198Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3107761Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3108222Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3108651Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:23:45.3109138Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.3109787Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3110454Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3110971Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:23:45.3111436Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:23:45.3112695Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:23:45.3113540Z warnings.warn( 2023-01-11T22:23:45.3114669Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:23:45.3115439Z warnings.warn( 2023-01-11T22:23:45.3115691Z dist init r=1, world=2 2023-01-11T22:23:45.3115944Z dist init r=0, world=2 2023-01-11T22:23:45.3116164Z ok (4.513s) 2023-01-11T22:23:45.3116792Z test_main_wrap_api_cpu_offload_CPUOffload(offload_params=False)_backward_prefetch_BackwardPrefetch_BACKWARD_POST_forward_prefetch_False_cuda_init_mode_CUDAInitMode_CUDA_BEFORE (__main__.TestFSDPWrap) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66771 2023-01-11T22:23:45.3117504Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66772 2023-01-11T22:23:45.3118172Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3118613Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3119186Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3119654Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3120211Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3120651Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3121216Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3121674Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3122110Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.3122608Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:23:45.3123260Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3123943Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3124444Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:23:45.3124911Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:23:45.3125263Z dist init r=0, world=2 2023-01-11T22:23:45.3125496Z dist init r=1, world=2 2023-01-11T22:23:45.3125735Z ok (4.513s) 2023-01-11T22:23:45.3126368Z test_main_wrap_api_cpu_offload_CPUOffload(offload_params=False)_backward_prefetch_BackwardPrefetch_BACKWARD_POST_forward_prefetch_True_cuda_init_mode_CUDAInitMode_CUDA_AFTER (__main__.TestFSDPWrap) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66854 2023-01-11T22:23:45.3127076Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66855 2023-01-11T22:23:45.3127669Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3128116Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3128687Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3129134Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3129704Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3130219Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3130796Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3131238Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3131690Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.3132180Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:23:45.3132833Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3133498Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3134015Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:23:45.3134486Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:23:45.3135864Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:23:45.3136892Z warnings.warn( 2023-01-11T22:23:45.3138061Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:23:45.3138839Z warnings.warn( 2023-01-11T22:23:45.3139090Z dist init r=0, world=2 2023-01-11T22:23:45.3139325Z dist init r=1, world=2 2023-01-11T22:23:45.3139565Z ok (4.413s) 2023-01-11T22:23:45.3140190Z test_main_wrap_api_cpu_offload_CPUOffload(offload_params=False)_backward_prefetch_BackwardPrefetch_BACKWARD_POST_forward_prefetch_True_cuda_init_mode_CUDAInitMode_CUDA_BEFORE (__main__.TestFSDPWrap) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66937 2023-01-11T22:23:45.3140896Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66938 2023-01-11T22:23:45.3141483Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3141929Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3142509Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3142975Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3143536Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3143977Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3144540Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3144986Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3145434Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.3145924Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:23:45.3146574Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3147356Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3147874Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:23:45.3148342Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:23:45.3148701Z dist init r=1, world=2 2023-01-11T22:23:45.3148935Z dist init r=0, world=2 2023-01-11T22:23:45.3149175Z ok (4.513s) 2023-01-11T22:23:45.3149806Z test_main_wrap_api_cpu_offload_CPUOffload(offload_params=False)_backward_prefetch_BackwardPrefetch_BACKWARD_PRE_forward_prefetch_False_cuda_init_mode_CUDAInitMode_CUDA_AFTER (__main__.TestFSDPWrap) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67020 2023-01-11T22:23:45.3150495Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67021 2023-01-11T22:23:45.3151111Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3151637Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3152230Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3152685Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3153260Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3153699Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3154246Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3154707Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3155162Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.3155654Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:23:45.3156293Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3156974Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3157496Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:23:45.3157963Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:23:45.3159205Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:23:45.3159972Z warnings.warn( 2023-01-11T22:23:45.3161126Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:23:45.3161895Z warnings.warn( 2023-01-11T22:23:45.3162144Z dist init r=0, world=2 2023-01-11T22:23:45.3162378Z dist init r=1, world=2 2023-01-11T22:23:45.3162619Z ok (4.513s) 2023-01-11T22:23:45.3163250Z test_main_wrap_api_cpu_offload_CPUOffload(offload_params=False)_backward_prefetch_BackwardPrefetch_BACKWARD_PRE_forward_prefetch_False_cuda_init_mode_CUDAInitMode_CUDA_BEFORE (__main__.TestFSDPWrap) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67103 2023-01-11T22:23:45.3164016Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67104 2023-01-11T22:23:45.3164630Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3165080Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3165651Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3166100Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3166676Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3167123Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3167689Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3168211Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3168677Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:23:45.3169170Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.3169810Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3170490Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3171007Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:23:45.3171483Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:23:45.3171818Z dist init r=0, world=2 2023-01-11T22:23:45.3172068Z dist init r=1, world=2 2023-01-11T22:23:45.3172309Z ok (4.513s) 2023-01-11T22:23:45.3172917Z test_main_wrap_api_cpu_offload_CPUOffload(offload_params=False)_backward_prefetch_BackwardPrefetch_BACKWARD_PRE_forward_prefetch_True_cuda_init_mode_CUDAInitMode_CUDA_AFTER (__main__.TestFSDPWrap) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67186 2023-01-11T22:23:45.3173623Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67187 2023-01-11T22:23:45.3174232Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3174677Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3175229Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3175699Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3176277Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3177077Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3177640Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3178104Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3178554Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:23:45.3179028Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.3179684Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3180483Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3181005Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:23:45.3181456Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:23:45.3182711Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:23:45.3183476Z warnings.warn( 2023-01-11T22:23:45.3184711Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:23:45.3185500Z warnings.warn( 2023-01-11T22:23:45.3185732Z dist init r=1, world=2 2023-01-11T22:23:45.3185982Z dist init r=0, world=2 2023-01-11T22:23:45.3186223Z ok (4.513s) 2023-01-11T22:23:45.3186851Z test_main_wrap_api_cpu_offload_CPUOffload(offload_params=False)_backward_prefetch_BackwardPrefetch_BACKWARD_PRE_forward_prefetch_True_cuda_init_mode_CUDAInitMode_CUDA_BEFORE (__main__.TestFSDPWrap) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67269 2023-01-11T22:23:45.3187541Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67270 2023-01-11T22:23:45.3188156Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3188608Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3189157Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3189629Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3190199Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3190637Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3191187Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3191646Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3192094Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:23:45.3192590Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.3193235Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3193920Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3194438Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:23:45.3194887Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:23:45.3195240Z dist init r=1, world=2 2023-01-11T22:23:45.3195490Z dist init r=0, world=2 2023-01-11T22:23:45.3195729Z ok (4.513s) 2023-01-11T22:23:45.3196334Z test_main_wrap_api_cpu_offload_CPUOffload(offload_params=True)_backward_prefetch_BackwardPrefetch_BACKWARD_POST_forward_prefetch_False_cuda_init_mode_CUDAInitMode_CUDA_AFTER (__main__.TestFSDPWrap) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67352 2023-01-11T22:23:45.3197121Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67353 2023-01-11T22:23:45.3197734Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3198181Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3198732Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3199192Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3199758Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3200175Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3200745Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3201259Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3201720Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.3202195Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:23:45.3202856Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3203537Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3204058Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:23:45.3204509Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:23:45.3204863Z dist init r=1, world=2 2023-01-11T22:23:45.3205113Z dist init r=0, world=2 2023-01-11T22:23:45.3205333Z ok (3.912s) 2023-01-11T22:23:45.3205964Z test_main_wrap_api_cpu_offload_CPUOffload(offload_params=True)_backward_prefetch_BackwardPrefetch_BACKWARD_POST_forward_prefetch_False_cuda_init_mode_CUDAInitMode_CUDA_BEFORE (__main__.TestFSDPWrap) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67431 2023-01-11T22:23:45.3206669Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67432 2023-01-11T22:23:45.3207280Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3207714Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3208283Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3208754Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3209307Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3209751Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3210319Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3210779Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3211208Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.3211694Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:23:45.3212341Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3213028Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3213602Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:23:45.3214074Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:23:45.3214431Z dist init r=0, world=2 2023-01-11T22:23:45.3214664Z dist init r=1, world=2 2023-01-11T22:23:45.3214906Z ok (4.512s) 2023-01-11T22:23:45.3215535Z test_main_wrap_api_cpu_offload_CPUOffload(offload_params=True)_backward_prefetch_BackwardPrefetch_BACKWARD_POST_forward_prefetch_True_cuda_init_mode_CUDAInitMode_CUDA_AFTER (__main__.TestFSDPWrap) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67514 2023-01-11T22:23:45.3216241Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67515 2023-01-11T22:23:45.3217119Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3217580Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3218244Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3218707Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3219288Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3219733Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3220301Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3220740Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3221189Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:23:45.3221685Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.3222341Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3223008Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3223527Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:23:45.3223994Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:23:45.3224328Z dist init r=0, world=2 2023-01-11T22:23:45.3224580Z dist init r=1, world=2 2023-01-11T22:23:45.3224819Z ok (3.912s) 2023-01-11T22:23:45.3225446Z test_main_wrap_api_cpu_offload_CPUOffload(offload_params=True)_backward_prefetch_BackwardPrefetch_BACKWARD_POST_forward_prefetch_True_cuda_init_mode_CUDAInitMode_CUDA_BEFORE (__main__.TestFSDPWrap) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67593 2023-01-11T22:23:45.3226139Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67594 2023-01-11T22:23:45.3226753Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3227201Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3227751Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3228218Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3228792Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3229232Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3229787Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3230341Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3230794Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.3231284Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:23:45.3231924Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3232602Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3233118Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:23:45.3233571Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:23:45.3233924Z dist init r=0, world=2 2023-01-11T22:23:45.3234183Z dist init r=1, world=2 2023-01-11T22:23:45.3234421Z ok (4.413s) 2023-01-11T22:23:45.3235146Z test_main_wrap_api_cpu_offload_CPUOffload(offload_params=True)_backward_prefetch_BackwardPrefetch_BACKWARD_PRE_forward_prefetch_False_cuda_init_mode_CUDAInitMode_CUDA_AFTER (__main__.TestFSDPWrap) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67676 2023-01-11T22:23:45.3235870Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67677 2023-01-11T22:23:45.3236482Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3236933Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3237483Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3237948Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3238521Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3238947Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3239521Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3239981Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3240428Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:23:45.3240899Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.3241554Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3242229Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3242747Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:23:45.3243195Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:23:45.3243545Z dist init r=1, world=2 2023-01-11T22:23:45.3243798Z dist init r=0, world=2 2023-01-11T22:23:45.3244018Z ok (3.912s) 2023-01-11T22:23:45.3244637Z test_main_wrap_api_cpu_offload_CPUOffload(offload_params=True)_backward_prefetch_BackwardPrefetch_BACKWARD_PRE_forward_prefetch_False_cuda_init_mode_CUDAInitMode_CUDA_BEFORE (__main__.TestFSDPWrap) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67755 2023-01-11T22:23:45.3245336Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67756 2023-01-11T22:23:45.3245944Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3246374Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3247021Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3247492Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3248049Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3248493Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3249056Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3249512Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3249941Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:23:45.3250430Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.3251081Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3251815Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3252321Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:23:45.3252796Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:23:45.3253147Z dist init r=1, world=2 2023-01-11T22:23:45.3253380Z dist init r=0, world=2 2023-01-11T22:23:45.3253617Z ok (4.513s) 2023-01-11T22:23:45.3254235Z test_main_wrap_api_cpu_offload_CPUOffload(offload_params=True)_backward_prefetch_BackwardPrefetch_BACKWARD_PRE_forward_prefetch_True_cuda_init_mode_CUDAInitMode_CUDA_AFTER (__main__.TestFSDPWrap) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67838 2023-01-11T22:23:45.3254936Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67839 2023-01-11T22:23:45.3255529Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3255982Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3256868Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3257340Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3257923Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3258367Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3258934Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3259377Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3259834Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.3260328Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:23:45.3260979Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3261641Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3262156Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:23:45.3262620Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:23:45.3262953Z dist init r=0, world=2 2023-01-11T22:23:45.3263202Z dist init r=1, world=2 2023-01-11T22:23:45.3263441Z ok (3.912s) 2023-01-11T22:23:45.3264070Z test_main_wrap_api_cpu_offload_CPUOffload(offload_params=True)_backward_prefetch_BackwardPrefetch_BACKWARD_PRE_forward_prefetch_True_cuda_init_mode_CUDAInitMode_CUDA_BEFORE (__main__.TestFSDPWrap) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67917 2023-01-11T22:23:45.3264865Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67918 2023-01-11T22:23:45.3265478Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3265926Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3266500Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3266947Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3267521Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3267968Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3268516Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3269050Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3269513Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.3270003Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:23:45.3270642Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3271322Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3271837Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:23:45.3272312Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:23:45.3272646Z dist init r=0, world=2 2023-01-11T22:23:45.3272896Z dist init r=1, world=2 2023-01-11T22:23:45.3273138Z ok (4.513s) 2023-01-11T22:23:45.3273578Z test_wrap_batchnorm_individually_use_or_policy_False (__main__.TestFSDPWrap) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68000 2023-01-11T22:23:45.3274124Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68001 2023-01-11T22:23:45.3274731Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3275160Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3275737Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3276198Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3276772Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3277196Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3277763Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3278221Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3278670Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:23:45.3279143Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.3279793Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3280553Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3281123Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:23:45.3281591Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:23:45.3282850Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:23:45.3283624Z warnings.warn( 2023-01-11T22:23:45.3284813Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:23:45.3285595Z warnings.warn( 2023-01-11T22:23:45.3285825Z dist init r=1, world=2 2023-01-11T22:23:45.3286074Z dist init r=0, world=2 2023-01-11T22:23:45.3286315Z ok (3.910s) 2023-01-11T22:23:45.3286749Z test_wrap_batchnorm_individually_use_or_policy_True (__main__.TestFSDPWrap) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68079 2023-01-11T22:23:45.3287292Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68080 2023-01-11T22:23:45.3287899Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3288326Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3288902Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3289369Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3289948Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:23:45.3290371Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:45.3290938Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:45.3291396Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:45.3291826Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:23:45.3292318Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:45.3292970Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3293662Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:23:45.3294162Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:23:45.3294627Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:23:45.3295877Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:23:45.3296990Z warnings.warn( 2023-01-11T22:23:45.3298149Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:23:45.3298916Z warnings.warn( 2023-01-11T22:23:45.3299148Z dist init r=1, world=2 2023-01-11T22:23:45.3299397Z dist init r=0, world=2 2023-01-11T22:23:45.3299637Z ok (3.910s) 2023-01-11T22:23:45.3299767Z 2023-01-11T22:23:45.3300036Z ---------------------------------------------------------------------- 2023-01-11T22:23:45.3300365Z Ran 47 tests in 99.572s 2023-01-11T22:23:45.3300527Z 2023-01-11T22:23:45.3300622Z OK 2023-01-11T22:23:45.3300756Z 2023-01-11T22:23:45.3300863Z Generating XML reports... 2023-01-11T22:23:45.3301430Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_wrap/TEST-TestAutoWrap-20230111222205.xml 2023-01-11T22:23:45.3302230Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_wrap/TEST-TestFSDPWrap-20230111222205.xml 2023-01-11T22:23:45.3302573Z 2023-01-11T22:23:45.3302956Z ##[endgroup] 2023-01-11T22:23:45.3303537Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_wrap (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_wrap_r4nee643) 2023-01-11T22:23:45.3303875Z 2023-01-11T22:23:45.3304176Z Running distributed/optim/test_zero_redundancy_optimizer ... [2023-01-11 22:23:45.296535] 2023-01-11T22:23:45.3304902Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/optim/test_zero_redundancy_optimizer.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:23:45.296800] 2023-01-11T22:26:50.2759533Z 2023-01-11T22:26:50.2762347Z Expand the folded group to see the log file of distributed/optim/test_zero_redundancy_optimizer 2023-01-11T22:26:50.2763386Z ##[group]PRINTING LOG FILE of distributed/optim/test_zero_redundancy_optimizer (/var/lib/jenkins/workspace/test/test-reports/distributed-optim-test_zero_redundancy_optimizer_vou5qrjh) 2023-01-11T22:26:50.2763795Z 2023-01-11T22:26:50.2763913Z Running tests... 2023-01-11T22:26:50.2764410Z ---------------------------------------------------------------------- 2023-01-11T22:26:50.2771466Z Test results will be stored in test-reports/python-unittest/distributed.optim.test_zero_redundancy_optimizer 2023-01-11T22:26:50.2772019Z test_add_param_group (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.2772526Z Check that ZeroRedundancyOptimizer properly handles adding a new ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:26:50.2773031Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68193 2023-01-11T22:26:50.2773482Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68194 2023-01-11T22:26:50.2774128Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2774570Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2775812Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2776300Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2777382Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2777822Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2778412Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2778885Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2779521Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.2780802Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.2781911Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.2782925Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.2784290Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2785721Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2786485Z ok (4.151s) 2023-01-11T22:26:50.2787193Z test_collect_shards (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.2788445Z Check the state consolidation mechanism and the state dict exposed ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68263 2023-01-11T22:26:50.2789503Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68264 2023-01-11T22:26:50.2790896Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2791858Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2792953Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2793854Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2794914Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2795738Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2796820Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2797699Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2798551Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.2799544Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.2800042Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.2800518Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.2801181Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2801870Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2802265Z ok (5.715s) 2023-01-11T22:26:50.2802758Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_False_static_graph_False_shard_buckets_False (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.2803478Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68347 2023-01-11T22:26:50.2804028Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68348 2023-01-11T22:26:50.2804640Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2805071Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2805647Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2806115Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2806696Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2807221Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2807801Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2808264Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2808683Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.2809158Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.2809647Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.2810143Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.2810782Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2811474Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2812434Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.2813500Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.2814148Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2814634Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2815112Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2815592Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2815924Z ok (4.812s) 2023-01-11T22:26:50.2816437Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_False_static_graph_False_shard_buckets_True (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.2817702Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68460 2023-01-11T22:26:50.2818255Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68461 2023-01-11T22:26:50.2818867Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2819319Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2819895Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2820351Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2820935Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2821381Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2821948Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2822391Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2822830Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.2823321Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.2823785Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.2824264Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.2825056Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2825751Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2826639Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.2827684Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.2828355Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2828845Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2829304Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2829857Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2830222Z ok (4.712s) 2023-01-11T22:26:50.2830731Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_False_static_graph_True_shard_buckets_False (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.2831418Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68573 2023-01-11T22:26:50.2831963Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68574 2023-01-11T22:26:50.2832579Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2833036Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2833596Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2834064Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2834643Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2835066Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2835637Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2836099Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2836538Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.2836991Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.2837478Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.2837976Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.2838634Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2839301Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2840203Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.2841292Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.2842086Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2842559Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2843038Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2843517Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2843844Z ok (4.712s) 2023-01-11T22:26:50.2844350Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_False_static_graph_True_shard_buckets_True (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.2845049Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68686 2023-01-11T22:26:50.2845597Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68687 2023-01-11T22:26:50.2846197Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2846709Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2847303Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2847773Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2848332Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2848783Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2849351Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2849814Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2850239Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.2850717Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.2851212Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.2851685Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.2852342Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2853027Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2853928Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.2854961Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.2855628Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2856113Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2856986Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2857662Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2858014Z ok (4.714s) 2023-01-11T22:26:50.2858524Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_True_static_graph_False_shard_buckets_False (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.2859347Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68799 2023-01-11T22:26:50.2859878Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68800 2023-01-11T22:26:50.2860508Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2860960Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2861514Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2861984Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2862562Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2863015Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2863573Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2864135Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2864592Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.2865080Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.2865546Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.2866027Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.2866687Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2867350Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2868262Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.2869309Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.2869975Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2870459Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2870915Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2871393Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2871745Z ok (4.736s) 2023-01-11T22:26:50.2872231Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_True_static_graph_False_shard_buckets_True (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.2872942Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68912 2023-01-11T22:26:50.2873491Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68913 2023-01-11T22:26:50.2874106Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2874540Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2875110Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2875581Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2876250Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2876678Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2877250Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2877713Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2878131Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.2878609Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.2879089Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.2879584Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.2880216Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2880959Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2881871Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.2882918Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.2883563Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2884049Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2884533Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2885010Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2885338Z ok (4.712s) 2023-01-11T22:26:50.2885844Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_True_static_graph_True_shard_buckets_False (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.2886550Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69025 2023-01-11T22:26:50.2887097Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69026 2023-01-11T22:26:50.2887692Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2888140Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2888713Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2889167Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2889751Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2890194Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2890764Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2891210Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2891645Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.2892134Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.2892602Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.2893149Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.2893813Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2894501Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2895384Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.2896428Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.2897615Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2898110Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2898682Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2899153Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2899499Z ok (4.812s) 2023-01-11T22:26:50.2900007Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_True_static_graph_True_shard_buckets_True (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.2900689Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69138 2023-01-11T22:26:50.2901235Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69139 2023-01-11T22:26:50.2901861Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2902318Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2902878Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2903347Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2903929Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2904375Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2904925Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2905388Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2905828Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.2906286Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.2906771Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.2907266Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.2907924Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2908590Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2909489Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.2910531Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.2911297Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2911764Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2912243Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2912721Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2913069Z ok (4.712s) 2023-01-11T22:26:50.2913559Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_False_static_graph_False_shard_buckets_False (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.2914262Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69251 2023-01-11T22:26:50.2914811Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69252 2023-01-11T22:26:50.2915464Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2915931Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2916509Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2916981Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2917540Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2917986Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2918552Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2919022Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2919438Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.2919932Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.2920415Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.2920875Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.2921528Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2922212Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2923112Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.2924159Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.2924804Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2925286Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2925759Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2926218Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2926567Z ok (4.712s) 2023-01-11T22:26:50.2927076Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_False_static_graph_False_shard_buckets_True (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.2927853Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69364 2023-01-11T22:26:50.2928378Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69365 2023-01-11T22:26:50.2928987Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2929439Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2930017Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2930468Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2931041Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2931491Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2932099Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2932576Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2933015Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.2933488Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.2933957Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.2934452Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.2935110Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2935798Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2937210Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.2938575Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.2939248Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2939730Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2940190Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2940669Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2941021Z ok (4.812s) 2023-01-11T22:26:50.2941580Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_False_static_graph_True_shard_buckets_False (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.2942274Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69477 2023-01-11T22:26:50.2942818Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69478 2023-01-11T22:26:50.2943431Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2943860Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2944433Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2945013Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2945598Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2946026Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2946598Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2947057Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2947493Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.2947961Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.2948439Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.2948920Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.2949562Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2950315Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2951234Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.2952277Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.2952942Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2953406Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2953887Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2954363Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2954693Z ok (4.713s) 2023-01-11T22:26:50.2955198Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_False_static_graph_True_shard_buckets_True (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.2955900Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69590 2023-01-11T22:26:50.2956443Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69591 2023-01-11T22:26:50.2957036Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2957490Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2958069Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2958541Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2959106Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2959551Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2960118Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2960565Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2961000Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.2961471Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.2962025Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.2962511Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.2963171Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2963850Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2964748Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.2965768Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.2966433Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2966968Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2967457Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2967913Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2968257Z ok (4.712s) 2023-01-11T22:26:50.2968765Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_True_static_graph_False_shard_buckets_False (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.2969465Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69703 2023-01-11T22:26:50.2969989Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69704 2023-01-11T22:26:50.2970611Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2971063Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2971624Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2972091Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2972671Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2973113Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2973665Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2974128Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2974567Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.2975016Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.2975503Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.2975999Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.2977118Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2977845Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2978751Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.2979912Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.2980578Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2981042Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2981522Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2981996Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2982344Z ok (4.714s) 2023-01-11T22:26:50.2982830Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_True_static_graph_False_shard_buckets_True (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.2983532Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69816 2023-01-11T22:26:50.2984142Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69817 2023-01-11T22:26:50.2984779Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2985210Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2985785Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2986252Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2986813Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2987258Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2987826Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2988295Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.2988717Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.2989206Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.2989687Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.2990167Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.2990799Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2991480Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.2992388Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.2993431Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.2994078Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2994562Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2995036Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2995515Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.2995844Z ok (4.713s) 2023-01-11T22:26:50.2996431Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_True_static_graph_True_shard_buckets_False (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.2997137Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69929 2023-01-11T22:26:50.2997664Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69930 2023-01-11T22:26:50.2998278Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.2998728Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.2999300Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.2999752Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3000334Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3000781Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3001408Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3001864Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3002304Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.3002774Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.3003239Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.3003729Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.3004385Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3005078Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3005967Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.3007017Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.3007685Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.3008166Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.3008625Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.3009104Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.3009450Z ok (4.813s) 2023-01-11T22:26:50.3009956Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_True_static_graph_True_shard_buckets_True (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.3010639Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70042 2023-01-11T22:26:50.3011179Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70043 2023-01-11T22:26:50.3011789Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3012238Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3012790Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3013322Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3013907Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3014333Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3014902Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3015362Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3015796Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.3016246Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.3017216Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.3017744Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.3018499Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3019211Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3020111Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.3021150Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:26:50.3021816Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.3022286Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.3022765Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.3023240Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.3023565Z ok (4.815s) 2023-01-11T22:26:50.3023995Z test_local_optimizer_parity_optimizer_class_str_AdamW_maximize_False (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.3024619Z When combined with DDP, check that a local optimizer gives the same ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70155 2023-01-11T22:26:50.3025154Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70156 2023-01-11T22:26:50.3025747Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3026203Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3026777Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3027249Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3027806Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3028250Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3028819Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3029265Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3029710Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.3030197Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.3030775Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.3031244Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.3031909Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3032597Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3033667Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:26:50.3035251Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:26:50.3036382Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3037129Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3037863Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3038590Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3039324Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3040061Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3040791Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3041561Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3042295Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3043028Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3043758Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3044489Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3045264Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3045997Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3046724Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3047452Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3047899Z ok (5.115s) 2023-01-11T22:26:50.3048330Z test_local_optimizer_parity_optimizer_class_str_AdamW_maximize_True (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.3048957Z When combined with DDP, check that a local optimizer gives the same ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70238 2023-01-11T22:26:50.3049546Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70239 2023-01-11T22:26:50.3050156Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3050607Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3051185Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3051633Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3052208Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3052651Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3053229Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3053677Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3054112Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.3054583Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.3055073Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.3055548Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.3056202Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3057336Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3058418Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:26:50.3059923Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:26:50.3061155Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3061880Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3062620Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3063356Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3064087Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3064869Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3065615Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3066355Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3067085Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3067809Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3068526Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3069254Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3069985Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3070779Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3071506Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3072221Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3072689Z ok (5.115s) 2023-01-11T22:26:50.3073121Z test_local_optimizer_parity_optimizer_class_str_Adam_maximize_False (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.3073727Z When combined with DDP, check that a local optimizer gives the same ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70321 2023-01-11T22:26:50.3074262Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70322 2023-01-11T22:26:50.3074884Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3075337Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3075966Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3076441Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3077022Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3077465Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3078013Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3078474Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3078912Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.3079381Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.3079868Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.3080353Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.3081062Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3081739Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3082805Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:26:50.3084315Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:26:50.3085414Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3086156Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3086896Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3087629Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3088345Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3089082Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3089812Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3090601Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3091311Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3092029Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3092762Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3093490Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3094219Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3094970Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3095708Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3096437Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3097402Z ok (5.015s) 2023-01-11T22:26:50.3097820Z test_local_optimizer_parity_optimizer_class_str_Adam_maximize_True (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.3098443Z When combined with DDP, check that a local optimizer gives the same ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70404 2023-01-11T22:26:50.3098988Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70405 2023-01-11T22:26:50.3099620Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3100055Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3100628Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3101096Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3101673Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3102098Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3102663Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3103132Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3103553Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.3104041Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.3104525Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.3105006Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.3105642Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3106327Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3107529Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:26:50.3109037Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:26:50.3110218Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3110974Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3111692Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3112421Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3113162Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3113903Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3114627Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3115335Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3116062Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3116789Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3117518Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3118245Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3118949Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3119673Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3120398Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3121211Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3121657Z ok (5.115s) 2023-01-11T22:26:50.3122086Z test_local_optimizer_parity_optimizer_class_str_SGD_maximize_False (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.3122703Z When combined with DDP, check that a local optimizer gives the same ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70487 2023-01-11T22:26:50.3123238Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70488 2023-01-11T22:26:50.3123838Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3124288Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3124866Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3125319Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3125939Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3126398Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3126972Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3127417Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3127853Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.3128323Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.3128811Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.3129291Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.3129953Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3130643Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3131707Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:26:50.3133222Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:26:50.3134340Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3135063Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3135798Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3137123Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3137971Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3138683Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3139410Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3140144Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3140963Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3141757Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3142465Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3143198Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3143928Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3144663Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3145367Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3146090Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3146559Z ok (5.037s) 2023-01-11T22:26:50.3146983Z test_local_optimizer_parity_optimizer_class_str_SGD_maximize_True (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.3147579Z When combined with DDP, check that a local optimizer gives the same ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70570 2023-01-11T22:26:50.3148114Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70571 2023-01-11T22:26:50.3148746Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3149199Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3149758Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3150227Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3150804Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3151230Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3151791Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3152343Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3152787Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.3153241Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.3153716Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.3154207Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.3154864Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3155530Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3156654Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:26:50.3158153Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:26:50.3159273Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3160016Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3160750Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3161463Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3162192Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3162932Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3163664Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3164389Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3165097Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3165821Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3166602Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3167330Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3168055Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3168762Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3169488Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3170222Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:26:50.3170691Z ok (5.015s) 2023-01-11T22:26:50.3171072Z test_lr_scheduler (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.3171629Z Check that a normal PyTorch ``lr_scheduler`` is usable with ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70653 2023-01-11T22:26:50.3172152Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70654 2023-01-11T22:26:50.3172754Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3173205Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3173780Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3174253Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3174803Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3175253Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3175820Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3176280Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3176946Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.3177433Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.3177919Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.3178389Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.3179058Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3179746Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3180139Z ok (5.213s) 2023-01-11T22:26:50.3180489Z test_multiple_param_groups (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.3181074Z Check parity between constructing ZeRO with multiple parameter groups ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70737 2023-01-11T22:26:50.3181613Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70738 2023-01-11T22:26:50.3182221Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3182652Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3183313Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3183772Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3184326Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3184795Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3185376Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3185836Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3186254Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.3186725Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.3187205Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.3187684Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.3188408Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3189109Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3189499Z ok (5.816s) 2023-01-11T22:26:50.3189855Z test_nondefault_process_group (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.3190578Z Check that ZeroRedundancyOptimizer works with a non-default process ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70821 2023-01-11T22:26:50.3191130Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70822 2023-01-11T22:26:50.3191737Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3192169Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3192744Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3193209Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3193765Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3194207Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3194772Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3195226Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3195645Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.3196172Z INFO:torch.testing._internal.common_distributed:Skipping `test_nondefault_process_group()` since world size of 2 is less than 4 2023-01-11T22:26:50.3196683Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.3197201Z INFO:torch.testing._internal.common_distributed:Skipping `test_nondefault_process_group()` since world size of 2 is less than 4 2023-01-11T22:26:50.3197568Z ok (2.610s) 2023-01-11T22:26:50.3215660Z test_sharding (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.3216463Z Check ZeroRedundancyOptimizer's parameter sharding at construction ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70889 2023-01-11T22:26:50.3217253Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70890 2023-01-11T22:26:50.3217886Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3218489Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3219074Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3219543Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3220101Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3220540Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3221111Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3221571Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3221987Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.3222454Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.3222929Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.3223477Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.3224130Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3224789Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3225156Z ok (2.509s) 2023-01-11T22:26:50.3225466Z test_step (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.3226012Z Check that ZeroRedundancyOptimizer properly exposes the ``step()`` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70959 2023-01-11T22:26:50.3226548Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70960 2023-01-11T22:26:50.3227157Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3227581Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3228132Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3228574Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3229122Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3229555Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3230119Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3230575Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3230991Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.3231478Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.3231968Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.3232423Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.3233074Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3233756Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3234144Z ok (4.613s) 2023-01-11T22:26:50.3234485Z test_step_with_closure (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.3235056Z Check that ZeroRedundancyOptimizer properly exposes the ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71042 2023-01-11T22:26:50.3235682Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 71043 2023-01-11T22:26:50.3236294Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3236723Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3237298Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3237763Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3238316Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3238751Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3239313Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3239775Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3240243Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.3240739Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.3241217Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.3241739Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.3242398Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3243080Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3243477Z ok (4.613s) 2023-01-11T22:26:50.3243814Z test_zero_join_cpu (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.3244371Z Check that the ZeRO join hook allows training with uneven inputs ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71125 2023-01-11T22:26:50.3244901Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 71126 2023-01-11T22:26:50.3245510Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3245939Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3246509Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3246973Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3247523Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3247958Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3248522Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3248982Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3249399Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.3249870Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.3250350Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.3250842Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.3251478Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3252156Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3252753Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.3253223Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.3253871Z /opt/conda/lib/python3.10/tempfile.py:860: ResourceWarning: Implicitly cleaning up 2023-01-11T22:26:50.3254325Z _warnings.warn(warn_message, ResourceWarning) 2023-01-11T22:26:50.3254897Z /opt/conda/lib/python3.10/tempfile.py:860: ResourceWarning: Implicitly cleaning up 2023-01-11T22:26:50.3255327Z _warnings.warn(warn_message, ResourceWarning) 2023-01-11T22:26:50.3255609Z ok (2.708s) 2023-01-11T22:26:50.3255962Z test_zero_join_gpu (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.3256486Z Check that the ZeRO join hook allows training with uneven inputs ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71203 2023-01-11T22:26:50.3257288Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 71204 2023-01-11T22:26:50.3257993Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3258457Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3259017Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3259487Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3260062Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3260501Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3261047Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3261511Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3261946Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.3262419Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:26:50.3262902Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.3263383Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.3264034Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3264696Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:26:50.3265217Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.3265698Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:26:50.3266351Z /opt/conda/lib/python3.10/tempfile.py:860: ResourceWarning: Implicitly cleaning up 2023-01-11T22:26:50.3266790Z _warnings.warn(warn_message, ResourceWarning) 2023-01-11T22:26:50.3267364Z /opt/conda/lib/python3.10/tempfile.py:860: ResourceWarning: Implicitly cleaning up 2023-01-11T22:26:50.3267807Z _warnings.warn(warn_message, ResourceWarning) 2023-01-11T22:26:50.3268070Z ok (5.814s) 2023-01-11T22:26:50.3268478Z test_zero_model_parallel_parameters_as_bucket_view_False (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.3269206Z Check that ZeRO works with model parallelism where the model's ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71287 2023-01-11T22:26:50.3269737Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 71288 2023-01-11T22:26:50.3270414Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3270869Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3271435Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3271880Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3272454Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3272900Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3273467Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3273907Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3274341Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.3274804Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.3275230Z skip: Need at least 4 CUDA devices (2.608s) 2023-01-11T22:26:50.3275699Z test_zero_model_parallel_parameters_as_bucket_view_True (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:26:50.3276438Z Check that ZeRO works with model parallelism where the model's ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71355 2023-01-11T22:26:50.3276969Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 71356 2023-01-11T22:26:50.3277554Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3278000Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3278572Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3279040Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3279599Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3280038Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3280606Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3281047Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3281479Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.3281944Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:26:50.3282336Z skip: Need at least 4 CUDA devices (2.608s) 2023-01-11T22:26:50.3282709Z test_constructor (__main__.TestZeroRedundancyOptimizerSingleRank) 2023-01-11T22:26:50.3283277Z Check the robustness of the ZeroRedundancyOptimizer constructor by ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71423 2023-01-11T22:26:50.3283981Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3284409Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3284980Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3285445Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3285879Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.3286343Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.3286999Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:26:50.3287460Z ok (2.407s) 2023-01-11T22:26:50.3287795Z test_lr_scheduler (__main__.TestZeroRedundancyOptimizerSingleRank) 2023-01-11T22:26:50.3288344Z Check that a normal PyTorch ``lr_scheduler`` is usable with ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71458 2023-01-11T22:26:50.3289022Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3289472Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3290029Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3290494Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3290930Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.3291418Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.3292109Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:26:50.3292514Z ok (3.910s) 2023-01-11T22:26:50.3292878Z test_same_dense_param_type (__main__.TestZeroRedundancyOptimizerSingleRank) 2023-01-11T22:26:50.3293441Z Check that ZeroRedundancyOptimizer raises an exception if the input ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71500 2023-01-11T22:26:50.3294135Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3294582Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3295155Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3295608Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3296042Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.3296528Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.3297421Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:26:50.3297798Z ok (2.507s) 2023-01-11T22:26:50.3298148Z test_state_dict (__main__.TestZeroRedundancyOptimizerSingleRank) 2023-01-11T22:26:50.3298714Z Check that ZeroRedundancyOptimizer exposes the expected state dict ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71535 2023-01-11T22:26:50.3299400Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3299849Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3300424Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3300895Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3301312Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.3301796Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.3302447Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:26:50.3302821Z ok (4.011s) 2023-01-11T22:26:50.3303182Z test_step_with_extra_inner_key (__main__.TestZeroRedundancyOptimizerSingleRank) 2023-01-11T22:26:50.3303763Z Check that ZeroRedundancyOptimizer wrapping an optimizer that adds ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71577 2023-01-11T22:26:50.3304581Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3305010Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3305585Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3306049Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3306482Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.3306947Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.3307600Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:26:50.3307991Z ok (3.909s) 2023-01-11T22:26:50.3308325Z test_step_with_kwargs (__main__.TestZeroRedundancyOptimizerSingleRank) 2023-01-11T22:26:50.3308882Z Check that the ``step(**kwargs)`` interface is properly exposed. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71619 2023-01-11T22:26:50.3309636Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3310101Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3310661Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3311128Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3311567Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.3312037Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.3312694Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:26:50.3313086Z ok (3.910s) 2023-01-11T22:26:50.3313451Z test_step_without_closure (__main__.TestZeroRedundancyOptimizerSingleRank) 2023-01-11T22:26:50.3313994Z Check that the ``step()`` method (without closure) is handled as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71661 2023-01-11T22:26:50.3314675Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3315118Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3315685Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3316136Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3316572Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.3317056Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.3317692Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:26:50.3318086Z ok (4.011s) 2023-01-11T22:26:50.3318434Z test_zero_grad (__main__.TestZeroRedundancyOptimizerSingleRank) 2023-01-11T22:26:50.3318963Z Check that the ``zero_grad`` method is properly handled. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71703 2023-01-11T22:26:50.3319619Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:26:50.3320064Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:26:50.3320637Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:26:50.3321087Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:26:50.3321598Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:26:50.3322086Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:26:50.3322755Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:26:50.3323131Z ok (2.407s) 2023-01-11T22:26:50.3323283Z 2023-01-11T22:26:50.3323554Z ---------------------------------------------------------------------- 2023-01-11T22:26:50.3323883Z Ran 42 tests in 182.397s 2023-01-11T22:26:50.3324050Z 2023-01-11T22:26:50.3324141Z OK (skipped=2) 2023-01-11T22:26:50.3324296Z 2023-01-11T22:26:50.3324422Z Generating XML reports... 2023-01-11T22:26:50.3325142Z Generated XML report: test-reports/python-unittest/distributed.optim.test_zero_redundancy_optimizer/TEST-TestZeroRedundancyOptimizerDistributed-20230111222347.xml 2023-01-11T22:26:50.3326124Z Generated XML report: test-reports/python-unittest/distributed.optim.test_zero_redundancy_optimizer/TEST-TestZeroRedundancyOptimizerSingleRank-20230111222347.xml 2023-01-11T22:26:50.3326561Z 2023-01-11T22:26:50.3326930Z ##[endgroup] 2023-01-11T22:26:50.3327653Z FINISHED PRINTING LOG FILE of distributed/optim/test_zero_redundancy_optimizer (/var/lib/jenkins/workspace/test/test-reports/distributed-optim-test_zero_redundancy_optimizer_vou5qrjh) 2023-01-11T22:26:50.3328060Z 2023-01-11T22:26:50.3328314Z Running distributed/test_c10d_gloo ... [2023-01-11 22:26:50.276786] 2023-01-11T22:26:50.3328994Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/test_c10d_gloo.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:26:50.277035] 2023-01-11T22:40:48.5450940Z 2023-01-11T22:40:48.5453558Z Expand the folded group to see the log file of distributed/test_c10d_gloo 2023-01-11T22:40:48.5454462Z ##[group]PRINTING LOG FILE of distributed/test_c10d_gloo (/var/lib/jenkins/workspace/test/test-reports/distributed-test_c10d_gloo_oniv3k8o) 2023-01-11T22:40:48.5455059Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz83ksg0n 2023-01-11T22:40:48.5456067Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz83ksg0n/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5458917Z , <__main__.CommTest testMethod=test_broadcast_coalesced_gloo_cuda>, <__main__.CommTest testMethod=test_gloo_barrier_device_ids>, <__main__.CommTest testMethod=test_gloo_rank_membership>, <__main__.CommTest testMethod=test_gloo_warn_not_in_group>, <__main__.CommTest testMethod=test_sequence_num_incremented_gloo_default>, <__main__.CommTest testMethod=test_sequence_num_incremented_gloo_subgroup>, <__main__.CommTest testMethod=test_sequence_num_set_default_pg_gloo>, <__main__.CommTest testMethod=test_sequence_num_set_gloo_new_group>, <__main__.CommTest testMethod=test_tensor_dtype_complex>, <__main__.CommTest testMethod=test_tensor_dtype_mismatch>]> 2023-01-11T22:40:48.5460151Z test_broadcast_coalesced_gloo_cpu (__main__.CommTest) 2023-01-11T22:40:48.5460511Z test_broadcast_coalesced_gloo_cuda (__main__.CommTest) 2023-01-11T22:40:48.5460855Z test_gloo_barrier_device_ids (__main__.CommTest) 2023-01-11T22:40:48.5461174Z test_gloo_rank_membership (__main__.CommTest) 2023-01-11T22:40:48.5461503Z test_gloo_warn_not_in_group (__main__.CommTest) 2023-01-11T22:40:48.5461863Z test_sequence_num_incremented_gloo_default (__main__.CommTest) 2023-01-11T22:40:48.5462427Z test_sequence_num_incremented_gloo_subgroup (__main__.CommTest) 2023-01-11T22:40:48.5462796Z test_sequence_num_set_default_pg_gloo (__main__.CommTest) 2023-01-11T22:40:48.5463154Z test_sequence_num_set_gloo_new_group (__main__.CommTest) 2023-01-11T22:40:48.5463496Z test_tensor_dtype_complex (__main__.CommTest) 2023-01-11T22:40:48.5463807Z test_tensor_dtype_mismatch (__main__.CommTest) 2023-01-11T22:40:48.5471027Z , <__main__.CompilerTest testMethod=test_allgather_work_wait_gpu>, <__main__.CompilerTest testMethod=test_allreduce_work_wait_cpu>, <__main__.CompilerTest testMethod=test_allreduce_work_wait_gpu>, <__main__.CompilerTest testMethod=test_broadcast_work_wait_cpu>, <__main__.CompilerTest testMethod=test_broadcast_work_wait_gpu>, <__main__.CompilerTest testMethod=test_consecutive_comm_work_wait_cpu>, <__main__.CompilerTest testMethod=test_consecutive_comm_work_wait_gpu>, <__main__.CompilerTest testMethod=test_nested_comm_tensor_wrapping>, <__main__.CompilerTest testMethod=test_scatter_work_wait_cpu>, <__main__.CompilerTest testMethod=test_scatter_work_wait_gpu>]> 2023-01-11T22:40:48.5473328Z test_allgather_work_wait_cpu (__main__.CompilerTest) 2023-01-11T22:40:48.5473952Z test_allgather_work_wait_gpu (__main__.CompilerTest) 2023-01-11T22:40:48.5474530Z test_allreduce_work_wait_cpu (__main__.CompilerTest) 2023-01-11T22:40:48.5475126Z test_allreduce_work_wait_gpu (__main__.CompilerTest) 2023-01-11T22:40:48.5475645Z test_broadcast_work_wait_cpu (__main__.CompilerTest) 2023-01-11T22:40:48.5476192Z test_broadcast_work_wait_gpu (__main__.CompilerTest) 2023-01-11T22:40:48.5476941Z test_consecutive_comm_work_wait_cpu (__main__.CompilerTest) 2023-01-11T22:40:48.5477636Z test_consecutive_comm_work_wait_gpu (__main__.CompilerTest) 2023-01-11T22:40:48.5478086Z test_nested_comm_tensor_wrapping (__main__.CompilerTest) 2023-01-11T22:40:48.5478482Z test_scatter_work_wait_cpu (__main__.CompilerTest) 2023-01-11T22:40:48.5478826Z test_scatter_work_wait_gpu (__main__.CompilerTest) 2023-01-11T22:40:48.5486466Z , <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_dynamic_weight_sharing>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_once_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_once_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_static_graph_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_static_graph_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_weight_sharing>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_unused_params_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_unused_params_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_weight_sharing_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_weight_sharing_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_future_passing_cpu>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_future_passing_gpu_gloo>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_register_just_once>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_sparse_gradients>, <__main__.DistributedDataParallelTest testMethod=test_ddp_invalid_comm_hook_init>, <__main__.DistributedDataParallelTest testMethod=test_ddp_invalid_comm_hook_return_type>, <__main__.DistributedDataParallelTest testMethod=test_find_unused_parameters_when_unused_parameters_empty>, <__main__.DistributedDataParallelTest testMethod=test_global_local_unused_params_grad>, <__main__.DistributedDataParallelTest testMethod=test_global_local_unused_params_grad_with_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_global_local_unused_params_grad_with_static_graph>, <__main__.DistributedDataParallelTest testMethod=test_gloo_backend_1gpu_module_device_ids_integer_list>, <__main__.DistributedDataParallelTest testMethod=test_gloo_backend_1gpu_module_device_ids_torch_device_list>, <__main__.DistributedDataParallelTest testMethod=test_gloo_backend_2gpu_module>, <__main__.DistributedDataParallelTest testMethod=test_gloo_backend_4gpu_module>, <__main__.DistributedDataParallelTest testMethod=test_gloo_backend_cpu_module>, <__main__.DistributedDataParallelTest testMethod=test_gloo_backend_cpu_module_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_ignored_output>, <__main__.DistributedDataParallelTest testMethod=test_ignored_output_with_unused_parameters>, <__main__.DistributedDataParallelTest testMethod=test_ignored_sharded_tensor>, <__main__.DistributedDataParallelTest testMethod=test_invalid_powerSGD_state>, <__main__.DistributedDataParallelTest testMethod=test_save_load_checkpoint>, <__main__.DistributedDataParallelTest testMethod=test_sparse_gradients>, <__main__.DistributedDataParallelTest testMethod=test_sparse_gradients_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_sync_batch_norm_empty_input>, <__main__.DistributedDataParallelTest testMethod=test_sync_batch_norm_only_empty_input>]> 2023-01-11T22:40:48.5492699Z test_ddp_checkpointing_dynamic_module (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5493483Z test_ddp_checkpointing_dynamic_weight_sharing (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5494045Z test_ddp_checkpointing_once_use_reentrant_False (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5494970Z test_ddp_checkpointing_once_use_reentrant_True (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5495971Z test_ddp_checkpointing_twice_static_graph_use_reentrant_False (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5497380Z test_ddp_checkpointing_twice_static_graph_use_reentrant_True (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5498122Z test_ddp_checkpointing_twice_use_reentrant_False (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5498607Z test_ddp_checkpointing_twice_use_reentrant_True (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5499247Z test_ddp_checkpointing_twice_weight_sharing (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5499748Z test_ddp_checkpointing_unused_params_use_reentrant_False (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5500235Z test_ddp_checkpointing_unused_params_use_reentrant_True (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5500747Z test_ddp_checkpointing_weight_sharing_use_reentrant_False (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5501252Z test_ddp_checkpointing_weight_sharing_use_reentrant_True (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5501735Z test_ddp_comm_hook_future_passing_cpu (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5502170Z test_ddp_comm_hook_future_passing_gpu_gloo (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5502754Z test_ddp_comm_hook_register_just_once (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5503944Z test_ddp_comm_hook_sparse_gradients (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5504832Z test_ddp_invalid_comm_hook_init (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5505735Z test_ddp_invalid_comm_hook_return_type (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5506921Z test_find_unused_parameters_when_unused_parameters_empty (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5507940Z test_global_local_unused_params_grad (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5508850Z test_global_local_unused_params_grad_with_grad_is_view (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5509775Z test_global_local_unused_params_grad_with_static_graph (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5510667Z test_gloo_backend_1gpu_module_device_ids_integer_list (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5511594Z test_gloo_backend_1gpu_module_device_ids_torch_device_list (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5512632Z test_gloo_backend_2gpu_module (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5513408Z test_gloo_backend_4gpu_module (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5514175Z test_gloo_backend_cpu_module (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5514958Z test_gloo_backend_cpu_module_grad_is_view (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5515790Z test_ignored_output (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5516621Z test_ignored_output_with_unused_parameters (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5517428Z test_ignored_sharded_tensor (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5518231Z test_invalid_powerSGD_state (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5519001Z test_save_load_checkpoint (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5519745Z test_sparse_gradients (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5520523Z test_sparse_gradients_grad_is_view (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5521361Z test_sync_batch_norm_empty_input (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5522174Z test_sync_batch_norm_only_empty_input (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.5524646Z , <__main__.GlooProcessGroupWithDispatchedCollectivesTests testMethod=test_allgather_coalesced>, <__main__.GlooProcessGroupWithDispatchedCollectivesTests testMethod=test_allreduce_coalesced>, <__main__.GlooProcessGroupWithDispatchedCollectivesTests testMethod=test_collectives>, <__main__.GlooProcessGroupWithDispatchedCollectivesTests testMethod=test_monitored_barrier>]> 2023-01-11T22:40:48.5526901Z test_all_to_all_single (__main__.GlooProcessGroupWithDispatchedCollectivesTests) 2023-01-11T22:40:48.5527851Z test_allgather_coalesced (__main__.GlooProcessGroupWithDispatchedCollectivesTests) 2023-01-11T22:40:48.5528851Z test_allreduce_coalesced (__main__.GlooProcessGroupWithDispatchedCollectivesTests) 2023-01-11T22:40:48.5529837Z test_collectives (__main__.GlooProcessGroupWithDispatchedCollectivesTests) 2023-01-11T22:40:48.5530754Z test_monitored_barrier (__main__.GlooProcessGroupWithDispatchedCollectivesTests) 2023-01-11T22:40:48.5531546Z 2023-01-11T22:40:48.5541981Z , <__main__.ProcessGroupGlooTest testMethod=test_allgather_basics_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_allgather_checks>, <__main__.ProcessGroupGlooTest testMethod=test_allgather_coalesced_async>, <__main__.ProcessGroupGlooTest testMethod=test_allgather_coalesced_checks>, <__main__.ProcessGroupGlooTest testMethod=test_allgather_noncontiguous_input>, <__main__.ProcessGroupGlooTest testMethod=test_allgather_stress>, <__main__.ProcessGroupGlooTest testMethod=test_allgather_stress_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_basics>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_basics_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_basics_cuda_using_work_api>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_basics_using_work_api>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_checks>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_coalesced_async>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_coalesced_basics>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_coalesced_checks>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_coalesced_checks_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_coalesced_stress>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_stress>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_stress_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_barrier_implies_wait>, <__main__.ProcessGroupGlooTest testMethod=test_broadcast_basics>, <__main__.ProcessGroupGlooTest testMethod=test_broadcast_basics_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_broadcast_checks>, <__main__.ProcessGroupGlooTest testMethod=test_broadcast_stress>, <__main__.ProcessGroupGlooTest testMethod=test_broadcast_stress_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_empty_tensors>, <__main__.ProcessGroupGlooTest testMethod=test_gather_basics>, <__main__.ProcessGroupGlooTest testMethod=test_gather_basics_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_gather_checks>, <__main__.ProcessGroupGlooTest testMethod=test_gather_noncontiguous_input>, <__main__.ProcessGroupGlooTest testMethod=test_gather_stress>, <__main__.ProcessGroupGlooTest testMethod=test_gather_stress_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_multi_device_constructor>, <__main__.ProcessGroupGlooTest testMethod=test_reduce_basics>, <__main__.ProcessGroupGlooTest testMethod=test_reduce_basics_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_reduce_checks>, <__main__.ProcessGroupGlooTest testMethod=test_reduce_stress>, <__main__.ProcessGroupGlooTest testMethod=test_reduce_stress_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_round_robin>, <__main__.ProcessGroupGlooTest testMethod=test_round_robin_create_destroy>, <__main__.ProcessGroupGlooTest testMethod=test_scatter_basics>, <__main__.ProcessGroupGlooTest testMethod=test_scatter_basics_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_scatter_checks>, <__main__.ProcessGroupGlooTest testMethod=test_scatter_stress>, <__main__.ProcessGroupGlooTest testMethod=test_scatter_stress_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_send_recv_all_to_all>, <__main__.ProcessGroupGlooTest testMethod=test_sparse_allreduce_basics>, <__main__.ProcessGroupGlooTest testMethod=test_sparse_allreduce_basics_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_sparse_allreduce_checks>]> 2023-01-11T22:40:48.5551520Z test_allgather_basics (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5552238Z test_allgather_basics_cuda (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5552929Z test_allgather_checks (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5553621Z test_allgather_coalesced_async (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5554379Z test_allgather_coalesced_checks (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5555192Z test_allgather_noncontiguous_input (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5555822Z test_allgather_stress (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5556505Z test_allgather_stress_cuda (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5557213Z test_allreduce_basics (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5557891Z test_allreduce_basics_cuda (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5558616Z test_allreduce_basics_cuda_using_work_api (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5559396Z test_allreduce_basics_using_work_api (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5560132Z test_allreduce_checks (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5560918Z test_allreduce_coalesced_async (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5561635Z test_allreduce_coalesced_basics (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5562033Z test_allreduce_coalesced_checks (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5562428Z test_allreduce_coalesced_checks_cuda (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5562834Z test_allreduce_coalesced_stress (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5563315Z test_allreduce_stress (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5564039Z test_allreduce_stress_cuda (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5564740Z test_barrier_implies_wait (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5565562Z test_broadcast_basics (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5566254Z test_broadcast_basics_cuda (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5567037Z test_broadcast_checks (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5567825Z test_broadcast_stress (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5568681Z test_broadcast_stress_cuda (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5569356Z test_empty_tensors (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5570082Z test_gather_basics (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5570865Z test_gather_basics_cuda (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5571604Z test_gather_checks (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5572310Z test_gather_noncontiguous_input (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5573017Z test_gather_stress (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5573668Z test_gather_stress_cuda (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5574361Z test_multi_device_constructor (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5574998Z test_reduce_basics (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5575684Z test_reduce_basics_cuda (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5576352Z test_reduce_checks (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5577553Z test_reduce_stress (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5578212Z test_reduce_stress_cuda (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5578888Z test_round_robin (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5579672Z test_round_robin_create_destroy (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5580386Z test_scatter_basics (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5581035Z test_scatter_basics_cuda (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5581717Z test_scatter_checks (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5582400Z test_scatter_stress (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5583064Z test_scatter_stress_cuda (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5583751Z test_send_recv_all_to_all (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5584434Z test_sparse_allreduce_basics (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5585180Z test_sparse_allreduce_basics_cuda (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5585932Z test_sparse_allreduce_checks (__main__.ProcessGroupGlooTest) 2023-01-11T22:40:48.5587475Z , <__main__.ReducerTest testMethod=test_forward_backward_optimizer>, <__main__.ReducerTest testMethod=test_forward_backward_unused_parameters>, <__main__.ReducerTest testMethod=test_multi_dtype_multi_bucket>, <__main__.ReducerTest testMethod=test_multi_dtype_single_bucket>, <__main__.ReducerTest testMethod=test_single_dtype_single_bucket>]> 2023-01-11T22:40:48.5588941Z test_forward_backward (__main__.ReducerTest) 2023-01-11T22:40:48.5589589Z test_forward_backward_optimizer (__main__.ReducerTest) 2023-01-11T22:40:48.5590281Z test_forward_backward_unused_parameters (__main__.ReducerTest) 2023-01-11T22:40:48.5590963Z test_multi_dtype_multi_bucket (__main__.ReducerTest) 2023-01-11T22:40:48.5591555Z test_multi_dtype_single_bucket (__main__.ReducerTest) 2023-01-11T22:40:48.5592177Z test_single_dtype_single_bucket (__main__.ReducerTest) 2023-01-11T22:40:48.5592996Z ]> 2023-01-11T22:40:48.5593842Z test_logging_init (__main__.RendezvousEnvTest) 2023-01-11T22:40:48.5594191Z 2023-01-11T22:40:48.5594614Z ]> 2023-01-11T22:40:48.5595035Z test_default_store_timeout_gloo (__main__.TimeoutTest) 2023-01-11T22:40:48.5595691Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5596149Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5596725Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5597174Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5597640Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpe338oo3s 2023-01-11T22:40:48.5598282Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpe338oo3s/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5598583Z 2023-01-11T22:40:48.5598700Z Running tests... 2023-01-11T22:40:48.5599095Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5599620Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.5600098Z test_broadcast_coalesced_gloo_cpu (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.5600546Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71808 2023-01-11T22:40:48.5600990Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 71809 2023-01-11T22:40:48.5601595Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5602044Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5602599Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5603164Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5603748Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5604170Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5604735Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5605195Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5605653Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyzqh_jib 2023-01-11T22:40:48.5606172Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyzqh_jib/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5606707Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3lj09so7 2023-01-11T22:40:48.5607235Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3lj09so7/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5607740Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.5608190Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.5608667Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.5609155Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.5609788Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5610477Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5610873Z ok (4.072s) 2023-01-11T22:40:48.5611021Z 2023-01-11T22:40:48.5611287Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5611597Z Ran 1 test in 4.072s 2023-01-11T22:40:48.5611759Z 2023-01-11T22:40:48.5611853Z OK 2023-01-11T22:40:48.5611990Z 2023-01-11T22:40:48.5612114Z Generating XML reports... 2023-01-11T22:40:48.5612630Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111222654.xml 2023-01-11T22:40:48.5613278Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5613725Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5614296Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5614743Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5615287Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpuatq0z5d 2023-01-11T22:40:48.5615831Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpuatq0z5d/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5616128Z 2023-01-11T22:40:48.5616236Z Running tests... 2023-01-11T22:40:48.5617239Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5617823Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.5618306Z test_broadcast_coalesced_gloo_cuda (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.5618755Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71919 2023-01-11T22:40:48.5619199Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 71920 2023-01-11T22:40:48.5619800Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5620251Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5620897Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5621386Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5621963Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5622389Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5622961Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5623418Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5623873Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqn5tb37u 2023-01-11T22:40:48.5624391Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqn5tb37u/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5624903Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.5625397Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfe31d_9u 2023-01-11T22:40:48.5625904Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfe31d_9u/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5626401Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.5626879Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.5627362Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.5627996Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5628671Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5629070Z ok (5.589s) 2023-01-11T22:40:48.5629218Z 2023-01-11T22:40:48.5629483Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5629791Z Ran 1 test in 5.589s 2023-01-11T22:40:48.5629952Z 2023-01-11T22:40:48.5630046Z OK 2023-01-11T22:40:48.5630178Z 2023-01-11T22:40:48.5630301Z Generating XML reports... 2023-01-11T22:40:48.5630818Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111222700.xml 2023-01-11T22:40:48.5631476Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5631919Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5632486Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5633018Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5633481Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpih9b5t99 2023-01-11T22:40:48.5634016Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpih9b5t99/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5634312Z 2023-01-11T22:40:48.5634403Z Running tests... 2023-01-11T22:40:48.5634806Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5635327Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.5635793Z test_gloo_barrier_device_ids (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.5636232Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72032 2023-01-11T22:40:48.5636673Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 72033 2023-01-11T22:40:48.5637268Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5637701Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5638316Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5638790Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5639361Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5639787Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5640351Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5640810Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5641251Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7h2n1t1c 2023-01-11T22:40:48.5641789Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7h2n1t1c/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5642297Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.5642791Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3s9fz3ad 2023-01-11T22:40:48.5643301Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3s9fz3ad/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5643800Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.5644283Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.5644775Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.5645412Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5646093Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5646479Z ok (3.906s) 2023-01-11T22:40:48.5646630Z 2023-01-11T22:40:48.5646876Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5647199Z Ran 1 test in 3.907s 2023-01-11T22:40:48.5647361Z 2023-01-11T22:40:48.5647455Z OK 2023-01-11T22:40:48.5647588Z 2023-01-11T22:40:48.5647710Z Generating XML reports... 2023-01-11T22:40:48.5648225Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111222708.xml 2023-01-11T22:40:48.5648873Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5649317Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5649865Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5650397Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5650861Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgv9_b5z0 2023-01-11T22:40:48.5651393Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgv9_b5z0/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5651690Z 2023-01-11T22:40:48.5651781Z Running tests... 2023-01-11T22:40:48.5652183Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5652705Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.5653155Z test_gloo_rank_membership (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.5653610Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72141 2023-01-11T22:40:48.5654054Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 72142 2023-01-11T22:40:48.5654661Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5655152Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5655733Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5656197Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5657479Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5657916Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5658496Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5658962Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5659416Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzmdu1nc5 2023-01-11T22:40:48.5659957Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzmdu1nc5/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5660465Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.5660958Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3qlf9grm 2023-01-11T22:40:48.5661470Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3qlf9grm/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5661969Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.5662450Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.5662918Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.5663569Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5664255Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5664783Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:40:48.5665252Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:40:48.5665892Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:40:48.5666569Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:40:48.5666959Z ok (4.004s) 2023-01-11T22:40:48.5667089Z 2023-01-11T22:40:48.5667359Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5667684Z Ran 1 test in 4.005s 2023-01-11T22:40:48.5667951Z 2023-01-11T22:40:48.5668044Z OK 2023-01-11T22:40:48.5668177Z 2023-01-11T22:40:48.5668284Z Generating XML reports... 2023-01-11T22:40:48.5668827Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111222714.xml 2023-01-11T22:40:48.5669480Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5669923Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5670472Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5670989Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5671451Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8m7jaavz 2023-01-11T22:40:48.5671967Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8m7jaavz/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5672269Z 2023-01-11T22:40:48.5672377Z Running tests... 2023-01-11T22:40:48.5672779Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5673372Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.5673832Z test_gloo_warn_not_in_group (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.5674289Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72253 2023-01-11T22:40:48.5674732Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 72254 2023-01-11T22:40:48.5675313Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5675756Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5676323Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5676789Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5677342Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5677789Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5678349Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5678808Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5679247Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9l1dfaau 2023-01-11T22:40:48.5679779Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9l1dfaau/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5680285Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.5680757Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjj51il9b 2023-01-11T22:40:48.5681290Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjj51il9b/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5681794Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.5682275Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.5682746Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.5683399Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5684083Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5684611Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:40:48.5685145Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:40:48.5685795Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:40:48.5686470Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:40:48.5686839Z ok (5.519s) 2023-01-11T22:40:48.5686987Z 2023-01-11T22:40:48.5687250Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5687573Z Ran 1 test in 5.520s 2023-01-11T22:40:48.5687734Z 2023-01-11T22:40:48.5687827Z OK 2023-01-11T22:40:48.5687941Z 2023-01-11T22:40:48.5688064Z Generating XML reports... 2023-01-11T22:40:48.5688597Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111222720.xml 2023-01-11T22:40:48.5689247Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5689674Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5690290Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5690762Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5691221Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9mxlbdiu 2023-01-11T22:40:48.5691733Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9mxlbdiu/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5692031Z 2023-01-11T22:40:48.5692138Z Running tests... 2023-01-11T22:40:48.5692540Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5693044Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.5693531Z test_sequence_num_incremented_gloo_default (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.5694012Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72367 2023-01-11T22:40:48.5694453Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 72368 2023-01-11T22:40:48.5695033Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5695476Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5696044Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5696511Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5697739Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5698184Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5698758Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5699202Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5699669Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu4qzoerb 2023-01-11T22:40:48.5700210Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu4qzoerb/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5700741Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjlb0i8o3 2023-01-11T22:40:48.5701250Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjlb0i8o3/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5701750Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.5702214Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.5702677Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.5703274Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.5703938Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5704618Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5705127Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:40:48.5705609Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:40:48.5706253Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:40:48.5706927Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:40:48.5707298Z ok (5.557s) 2023-01-11T22:40:48.5707445Z 2023-01-11T22:40:48.5707715Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5708106Z Ran 1 test in 5.557s 2023-01-11T22:40:48.5708277Z 2023-01-11T22:40:48.5708353Z OK 2023-01-11T22:40:48.5708488Z 2023-01-11T22:40:48.5708611Z Generating XML reports... 2023-01-11T22:40:48.5709151Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111222728.xml 2023-01-11T22:40:48.5709808Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5710239Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5710809Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5711273Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5711741Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf0w405gy 2023-01-11T22:40:48.5712261Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf0w405gy/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5712559Z 2023-01-11T22:40:48.5712668Z Running tests... 2023-01-11T22:40:48.5713066Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5713570Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.5714064Z test_sequence_num_incremented_gloo_subgroup (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.5714540Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72484 2023-01-11T22:40:48.5714985Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 72485 2023-01-11T22:40:48.5715568Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5716014Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5716584Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5717032Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5717602Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5718041Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5718607Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5719048Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5719509Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfi6kbpmy 2023-01-11T22:40:48.5720111Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfi6kbpmy/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5720603Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.5721104Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxrsa8scf 2023-01-11T22:40:48.5721631Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxrsa8scf/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5722136Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.5722508Z skip: Need at least 4 CUDA devices (3.962s) 2023-01-11T22:40:48.5722698Z 2023-01-11T22:40:48.5722969Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5723297Z Ran 1 test in 3.962s 2023-01-11T22:40:48.5723458Z 2023-01-11T22:40:48.5723549Z OK (skipped=1) 2023-01-11T22:40:48.5723704Z 2023-01-11T22:40:48.5723827Z Generating XML reports... 2023-01-11T22:40:48.5724362Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111222736.xml 2023-01-11T22:40:48.5725063Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5725501Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5726071Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5726536Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5726996Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxqfa4gbw 2023-01-11T22:40:48.5727511Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxqfa4gbw/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5727812Z 2023-01-11T22:40:48.5727919Z Running tests... 2023-01-11T22:40:48.5728322Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5728827Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.5729311Z test_sequence_num_set_default_pg_gloo (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.5729779Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72587 2023-01-11T22:40:48.5730222Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 72588 2023-01-11T22:40:48.5730804Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5731248Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5731811Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5732258Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5732827Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5733269Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5733828Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5734267Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5734724Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5xousbqw 2023-01-11T22:40:48.5735261Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5xousbqw/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5735765Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5l2x4myb 2023-01-11T22:40:48.5736287Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5l2x4myb/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5737341Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.5737926Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.5738394Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.5738883Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.5739551Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5740238Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5740610Z ok (3.949s) 2023-01-11T22:40:48.5740758Z 2023-01-11T22:40:48.5741023Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5741344Z Ran 1 test in 3.949s 2023-01-11T22:40:48.5741503Z 2023-01-11T22:40:48.5741578Z OK 2023-01-11T22:40:48.5741716Z 2023-01-11T22:40:48.5741840Z Generating XML reports... 2023-01-11T22:40:48.5742373Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111222742.xml 2023-01-11T22:40:48.5743091Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5743529Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5744103Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5744570Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5745010Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1h968one 2023-01-11T22:40:48.5745541Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1h968one/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5745841Z 2023-01-11T22:40:48.5745951Z Running tests... 2023-01-11T22:40:48.5746350Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5746856Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.5747330Z test_sequence_num_set_gloo_new_group (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.5747798Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72696 2023-01-11T22:40:48.5748245Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 72697 2023-01-11T22:40:48.5748824Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5749269Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5749835Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5750300Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5750857Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5751300Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5751864Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5752306Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5752765Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpevfqm4jk 2023-01-11T22:40:48.5753299Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpevfqm4jk/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5753827Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7f3fyz_w 2023-01-11T22:40:48.5754335Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7f3fyz_w/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5754905Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.5755372Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.5755858Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.5756328Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.5756982Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5757662Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5758170Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:40:48.5758655Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:40:48.5759308Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:40:48.5760034Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:40:48.5760415Z ok (4.004s) 2023-01-11T22:40:48.5760562Z 2023-01-11T22:40:48.5760833Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5761159Z Ran 1 test in 4.004s 2023-01-11T22:40:48.5761319Z 2023-01-11T22:40:48.5761412Z OK 2023-01-11T22:40:48.5761528Z 2023-01-11T22:40:48.5761653Z Generating XML reports... 2023-01-11T22:40:48.5762186Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111222749.xml 2023-01-11T22:40:48.5762843Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5763272Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5763844Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5764310Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5764773Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqsz_dv1s 2023-01-11T22:40:48.5765286Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqsz_dv1s/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5765582Z 2023-01-11T22:40:48.5765690Z Running tests... 2023-01-11T22:40:48.5766084Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5766587Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.5767053Z test_tensor_dtype_complex (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.5767505Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72811 2023-01-11T22:40:48.5768016Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 72812 2023-01-11T22:40:48.5768612Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5769056Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5769623Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5770070Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5770679Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5771124Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5771694Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5772205Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5772662Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzwneo0no 2023-01-11T22:40:48.5773199Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzwneo0no/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5773705Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpme2eiw8x 2023-01-11T22:40:48.5774234Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpme2eiw8x/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5774734Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.5775200Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.5775667Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.5776163Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.5777391Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5778191Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5778579Z ok (3.948s) 2023-01-11T22:40:48.5778726Z 2023-01-11T22:40:48.5778996Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5779319Z Ran 1 test in 3.949s 2023-01-11T22:40:48.5779478Z 2023-01-11T22:40:48.5779553Z OK 2023-01-11T22:40:48.5779685Z 2023-01-11T22:40:48.5779807Z Generating XML reports... 2023-01-11T22:40:48.5780346Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111222755.xml 2023-01-11T22:40:48.5781000Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5781436Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5782008Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5782475Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5782919Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1ufmdgdt 2023-01-11T22:40:48.5783454Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1ufmdgdt/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5783753Z 2023-01-11T22:40:48.5783863Z Running tests... 2023-01-11T22:40:48.5784262Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5784767Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.5785232Z test_tensor_dtype_mismatch (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.5785685Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72920 2023-01-11T22:40:48.5786111Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 72921 2023-01-11T22:40:48.5786722Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5787168Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5787734Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5788181Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5788754Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5789190Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5789756Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5790282Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5790740Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2vtk7kwv 2023-01-11T22:40:48.5791276Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2vtk7kwv/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5791762Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.5792257Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpobvw5r7s 2023-01-11T22:40:48.5792788Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpobvw5r7s/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5793289Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.5793753Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.5794246Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.5794949Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5795645Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5796665Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2588: UserWarning: torch.distributed.all_gather_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:40:48.5797288Z warnings.warn( 2023-01-11T22:40:48.5798149Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2588: UserWarning: torch.distributed.all_gather_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:40:48.5798765Z warnings.warn( 2023-01-11T22:40:48.5799607Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1714: UserWarning: torch.distributed.all_reduce_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:40:48.5800213Z warnings.warn( 2023-01-11T22:40:48.5801060Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1714: UserWarning: torch.distributed.all_reduce_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:40:48.5801658Z warnings.warn( 2023-01-11T22:40:48.5801880Z ok (4.019s) 2023-01-11T22:40:48.5802026Z 2023-01-11T22:40:48.5802293Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5802623Z Ran 1 test in 4.019s 2023-01-11T22:40:48.5802782Z 2023-01-11T22:40:48.5802858Z OK 2023-01-11T22:40:48.5802992Z 2023-01-11T22:40:48.5803114Z Generating XML reports... 2023-01-11T22:40:48.5803650Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111222801.xml 2023-01-11T22:40:48.5804304Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5804730Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5805300Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5805758Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5806215Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp050ft4qj 2023-01-11T22:40:48.5806731Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp050ft4qj/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5807092Z 2023-01-11T22:40:48.5807201Z Running tests... 2023-01-11T22:40:48.5807606Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5808109Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.5808582Z test_allgather_work_wait_cpu (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.5809042Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 73029 2023-01-11T22:40:48.5809487Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 73030 2023-01-11T22:40:48.5810070Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5810516Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5811083Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5811532Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5812157Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5812606Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5813177Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5813620Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5814085Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0v3x7db9 2023-01-11T22:40:48.5842875Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0v3x7db9/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5843493Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxuagifl3 2023-01-11T22:40:48.5844067Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxuagifl3/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5844585Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.5845085Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.5845592Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.5846114Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.5846824Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5847550Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5848518Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.5849292Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.5850168Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.5850929Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.5851268Z ok (4.057s) 2023-01-11T22:40:48.5851423Z 2023-01-11T22:40:48.5851704Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5852029Z Ran 1 test in 4.057s 2023-01-11T22:40:48.5852199Z 2023-01-11T22:40:48.5852291Z OK 2023-01-11T22:40:48.5852581Z 2023-01-11T22:40:48.5852709Z Generating XML reports... 2023-01-11T22:40:48.5853284Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111222808.xml 2023-01-11T22:40:48.5853995Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5854468Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5855069Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5855550Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5856034Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp96m7wpnh 2023-01-11T22:40:48.5856881Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp96m7wpnh/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5857213Z 2023-01-11T22:40:48.5857325Z Running tests... 2023-01-11T22:40:48.5857752Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5858305Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.5858895Z test_allgather_work_wait_gpu (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.5859383Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 73138 2023-01-11T22:40:48.5859858Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 73139 2023-01-11T22:40:48.5860498Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5860971Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5861552Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5862044Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5862656Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5863113Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5863717Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5864206Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5864687Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplil4rlvl 2023-01-11T22:40:48.5865229Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplil4rlvl/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5865760Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.5866277Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpoerykmhu 2023-01-11T22:40:48.5866839Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpoerykmhu/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5867357Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.5867866Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.5868385Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.5869058Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5869777Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5870800Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.5871680Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.5872558Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.5873319Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.5873658Z ok (5.570s) 2023-01-11T22:40:48.5873813Z 2023-01-11T22:40:48.5874091Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5874414Z Ran 1 test in 5.570s 2023-01-11T22:40:48.5874585Z 2023-01-11T22:40:48.5874677Z OK 2023-01-11T22:40:48.5874813Z 2023-01-11T22:40:48.5874941Z Generating XML reports... 2023-01-11T22:40:48.5875516Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111222814.xml 2023-01-11T22:40:48.5876209Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5876737Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5877353Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5877831Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5878309Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmps_nf5_mh 2023-01-11T22:40:48.5878863Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmps_nf5_mh/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5879174Z 2023-01-11T22:40:48.5879284Z Running tests... 2023-01-11T22:40:48.5879689Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5880240Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.5880748Z test_allreduce_work_wait_cpu (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.5881222Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 73249 2023-01-11T22:40:48.5881698Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 73250 2023-01-11T22:40:48.5882332Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5882803Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5883388Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5883879Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5884483Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5884941Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5885540Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5886032Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5886512Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5sbi4ice 2023-01-11T22:40:48.5887053Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5sbi4ice/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5887583Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.5888100Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpw38ztf1h 2023-01-11T22:40:48.5888657Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpw38ztf1h/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5889166Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.5889738Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.5890264Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.5890936Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5891661Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5892631Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.5893390Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.5894339Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant1 target _tensor_constant1 _tensor_constant1 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.5895089Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.5895975Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.5896991Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.5897862Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant1 target _tensor_constant1 _tensor_constant1 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.5898563Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.5898885Z ok (4.054s) 2023-01-11T22:40:48.5899039Z 2023-01-11T22:40:48.5899307Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5899634Z Ran 1 test in 4.054s 2023-01-11T22:40:48.5899775Z 2023-01-11T22:40:48.5899868Z OK 2023-01-11T22:40:48.5900000Z 2023-01-11T22:40:48.5900123Z Generating XML reports... 2023-01-11T22:40:48.5900674Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111222822.xml 2023-01-11T22:40:48.5901325Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5901774Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5902344Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5902813Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5903259Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbxm7xwhw 2023-01-11T22:40:48.5903794Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbxm7xwhw/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5904096Z 2023-01-11T22:40:48.5904204Z Running tests... 2023-01-11T22:40:48.5904584Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5905107Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.5905584Z test_allreduce_work_wait_gpu (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.5906047Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 73358 2023-01-11T22:40:48.5906477Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 73359 2023-01-11T22:40:48.5907178Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5907629Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5908178Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5908643Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5909213Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5909654Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5910201Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5910665Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5911127Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp97jrr5zf 2023-01-11T22:40:48.5911661Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp97jrr5zf/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5912210Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.5912718Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqrfwc9lf 2023-01-11T22:40:48.5913249Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqrfwc9lf/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5913733Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.5914214Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.5914708Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.5915364Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5916036Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5916954Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.5917663Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.5918502Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant1 target _tensor_constant1 _tensor_constant1 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.5919189Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.5920040Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.5920742Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.5921580Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant1 target _tensor_constant1 _tensor_constant1 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.5922280Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.5922584Z ok (5.563s) 2023-01-11T22:40:48.5922732Z 2023-01-11T22:40:48.5922997Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5923320Z Ran 1 test in 5.563s 2023-01-11T22:40:48.5923541Z 2023-01-11T22:40:48.5923616Z OK 2023-01-11T22:40:48.5923749Z 2023-01-11T22:40:48.5923871Z Generating XML reports... 2023-01-11T22:40:48.5924426Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111222828.xml 2023-01-11T22:40:48.5925093Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5925523Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5926090Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5926555Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5926990Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpeehlv9vf 2023-01-11T22:40:48.5927526Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpeehlv9vf/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5927829Z 2023-01-11T22:40:48.5927938Z Running tests... 2023-01-11T22:40:48.5928338Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5928893Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.5929377Z test_broadcast_work_wait_cpu (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.5929839Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 73469 2023-01-11T22:40:48.5930267Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 73470 2023-01-11T22:40:48.5930869Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5931297Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5931859Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5932309Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5932880Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5933319Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5933883Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5934326Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5934783Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3xfkspi2 2023-01-11T22:40:48.5935315Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3xfkspi2/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5935802Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.5936291Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg7wwsgte 2023-01-11T22:40:48.5937071Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg7wwsgte/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5937558Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.5938017Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.5938503Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.5939149Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5939821Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5940713Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.5941537Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.5942358Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.5943053Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.5943359Z ok (4.044s) 2023-01-11T22:40:48.5943507Z 2023-01-11T22:40:48.5943772Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5944092Z Ran 1 test in 4.044s 2023-01-11T22:40:48.5944251Z 2023-01-11T22:40:48.5944327Z OK 2023-01-11T22:40:48.5944460Z 2023-01-11T22:40:48.5944583Z Generating XML reports... 2023-01-11T22:40:48.5945138Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111222836.xml 2023-01-11T22:40:48.5945866Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5946306Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5946875Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5947344Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5947782Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt1tceqrw 2023-01-11T22:40:48.5948316Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt1tceqrw/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5948615Z 2023-01-11T22:40:48.5948724Z Running tests... 2023-01-11T22:40:48.5949123Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5949628Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.5950107Z test_broadcast_work_wait_gpu (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.5950571Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 73578 2023-01-11T22:40:48.5951017Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 73579 2023-01-11T22:40:48.5951598Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5952041Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5952606Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5953052Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5953619Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5954060Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5954633Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5955074Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5955528Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9_unjezp 2023-01-11T22:40:48.5956061Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9_unjezp/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5956570Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt8bro_4c 2023-01-11T22:40:48.5957098Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt8bro_4c/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5957598Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.5958124Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.5958591Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.5959247Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5959775Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.5960417Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5961318Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.5962027Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.5962917Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.5963634Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.5963938Z ok (5.680s) 2023-01-11T22:40:48.5964084Z 2023-01-11T22:40:48.5964347Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5964672Z Ran 1 test in 5.680s 2023-01-11T22:40:48.5964831Z 2023-01-11T22:40:48.5964906Z OK 2023-01-11T22:40:48.5965039Z 2023-01-11T22:40:48.5965162Z Generating XML reports... 2023-01-11T22:40:48.5965707Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111222843.xml 2023-01-11T22:40:48.5966375Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5966797Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5967369Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5967834Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5968293Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjkwj4ubq 2023-01-11T22:40:48.5968811Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjkwj4ubq/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5969109Z 2023-01-11T22:40:48.5969218Z Running tests... 2023-01-11T22:40:48.5969616Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5970112Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.5970651Z test_consecutive_comm_work_wait_cpu (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.5971129Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 73689 2023-01-11T22:40:48.5971578Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 73690 2023-01-11T22:40:48.5972163Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5972602Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5973165Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5973611Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5974179Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5974615Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5975266Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5975709Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5976164Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfqe_vp35 2023-01-11T22:40:48.5976930Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfqe_vp35/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5977442Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_bfaodyg 2023-01-11T22:40:48.5977924Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.5978431Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_bfaodyg/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5978926Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.5979396Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.5979966Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.5980641Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5981323Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.5982218Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.5982923Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.5983764Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant1 target _tensor_constant1 _tensor_constant1 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.5984480Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.5985299Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant2 target _tensor_constant2 _tensor_constant2 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.5986006Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.5986845Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant3 target _tensor_constant3 _tensor_constant3 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.5987548Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.5988391Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.5989073Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.5989910Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant1 target _tensor_constant1 _tensor_constant1 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.5990607Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.5991436Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant2 target _tensor_constant2 _tensor_constant2 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.5992241Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.5993059Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant3 target _tensor_constant3 _tensor_constant3 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.5993754Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.5994075Z ok (4.068s) 2023-01-11T22:40:48.5994222Z 2023-01-11T22:40:48.5994470Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.5994794Z Ran 1 test in 4.068s 2023-01-11T22:40:48.5994950Z 2023-01-11T22:40:48.5995043Z OK 2023-01-11T22:40:48.5995174Z 2023-01-11T22:40:48.5995297Z Generating XML reports... 2023-01-11T22:40:48.5995834Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111222851.xml 2023-01-11T22:40:48.5996550Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.5997003Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.5997553Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.5998015Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.5998473Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxfkjtxu1 2023-01-11T22:40:48.5999009Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxfkjtxu1/_remote_module_non_scriptable.py 2023-01-11T22:40:48.5999309Z 2023-01-11T22:40:48.5999400Z Running tests... 2023-01-11T22:40:48.5999793Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6000318Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6000794Z test_consecutive_comm_work_wait_gpu (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6001269Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 73798 2023-01-11T22:40:48.6001716Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 73799 2023-01-11T22:40:48.6002315Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6002743Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6003308Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6003770Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6004343Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6004763Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6005322Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6005779Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6006219Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5uqvf1qs 2023-01-11T22:40:48.6006750Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5uqvf1qs/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6007280Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpul9_2v4k 2023-01-11T22:40:48.6007800Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpul9_2v4k/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6008280Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6008810Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6009295Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6009770Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6010423Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6011103Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6012016Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.6012724Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.6013591Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant1 target _tensor_constant1 _tensor_constant1 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.6014308Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.6015149Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant2 target _tensor_constant2 _tensor_constant2 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.6015853Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.6016932Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant3 target _tensor_constant3 _tensor_constant3 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.6017666Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.6018505Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.6019202Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.6020035Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant1 target _tensor_constant1 _tensor_constant1 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.6020718Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.6021562Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant2 target _tensor_constant2 _tensor_constant2 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.6022259Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.6023095Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant3 target _tensor_constant3 _tensor_constant3 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.6023772Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.6024092Z ok (5.520s) 2023-01-11T22:40:48.6024238Z 2023-01-11T22:40:48.6024504Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6024928Z Ran 1 test in 5.520s 2023-01-11T22:40:48.6025070Z 2023-01-11T22:40:48.6025162Z OK 2023-01-11T22:40:48.6025294Z 2023-01-11T22:40:48.6025421Z Generating XML reports... 2023-01-11T22:40:48.6025979Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111222857.xml 2023-01-11T22:40:48.6026626Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6027072Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6027638Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6028102Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6028542Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpssqbvt53 2023-01-11T22:40:48.6029074Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpssqbvt53/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6029376Z 2023-01-11T22:40:48.6029484Z Running tests... 2023-01-11T22:40:48.6029866Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6030452Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6030946Z test_nested_comm_tensor_wrapping (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6031413Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 73909 2023-01-11T22:40:48.6031842Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 73910 2023-01-11T22:40:48.6032435Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6032880Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6033425Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6033892Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6034462Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6034909Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6035456Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6035916Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6036374Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphhh64qd6 2023-01-11T22:40:48.6036908Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphhh64qd6/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6037397Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6037893Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfazxq4gd 2023-01-11T22:40:48.6038424Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfazxq4gd/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6038912Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6039395Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6039885Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6040531Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6041194Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6042106Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.6042885Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.6043724Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant1 target _tensor_constant1 _tensor_constant1 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.6044407Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.6045250Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.6045948Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.6046840Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant1 target _tensor_constant1 _tensor_constant1 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.6047550Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.6047855Z ok (4.000s) 2023-01-11T22:40:48.6048002Z 2023-01-11T22:40:48.6048264Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6048588Z Ran 1 test in 4.000s 2023-01-11T22:40:48.6048746Z 2023-01-11T22:40:48.6048821Z OK 2023-01-11T22:40:48.6048957Z 2023-01-11T22:40:48.6049079Z Generating XML reports... 2023-01-11T22:40:48.6049624Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111222905.xml 2023-01-11T22:40:48.6050286Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6050718Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6051286Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6051751Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6052193Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbbt2hd0e 2023-01-11T22:40:48.6052729Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbbt2hd0e/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6053029Z 2023-01-11T22:40:48.6053137Z Running tests... 2023-01-11T22:40:48.6053535Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6054040Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6054513Z test_scatter_work_wait_cpu (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6054978Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 74018 2023-01-11T22:40:48.6055412Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 74019 2023-01-11T22:40:48.6056011Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6056451Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6057274Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6057722Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6058293Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6058730Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6059385Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6059832Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6060291Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfktrg9dm 2023-01-11T22:40:48.6060824Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfktrg9dm/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6061338Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdsf0z5_p 2023-01-11T22:40:48.6061867Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdsf0z5_p/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6062368Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6062833Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6063297Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6063784Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6064505Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6065185Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6066103Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.6066814Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.6067652Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.6068428Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.6068732Z ok (4.034s) 2023-01-11T22:40:48.6068880Z 2023-01-11T22:40:48.6069142Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6069466Z Ran 1 test in 4.034s 2023-01-11T22:40:48.6069624Z 2023-01-11T22:40:48.6069700Z OK 2023-01-11T22:40:48.6069832Z 2023-01-11T22:40:48.6069955Z Generating XML reports... 2023-01-11T22:40:48.6070502Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111222911.xml 2023-01-11T22:40:48.6071210Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6071635Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6072206Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6072676Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6073119Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgv2ds4h6 2023-01-11T22:40:48.6073650Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgv2ds4h6/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6073950Z 2023-01-11T22:40:48.6074057Z Running tests... 2023-01-11T22:40:48.6074456Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6074958Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6075430Z test_scatter_work_wait_gpu (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6075890Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 74127 2023-01-11T22:40:48.6076389Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 74128 2023-01-11T22:40:48.6076996Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6077441Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6078010Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6078461Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6079028Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6079466Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6080027Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6080469Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6080924Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1axdrse4 2023-01-11T22:40:48.6081504Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1axdrse4/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6082021Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg6ukqpv2 2023-01-11T22:40:48.6082548Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg6ukqpv2/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6083052Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6083517Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6083979Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6084469Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6085127Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6085811Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6086708Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.6087420Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.6088260Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:40:48.6088964Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:40:48.6089269Z ok (5.612s) 2023-01-11T22:40:48.6089414Z 2023-01-11T22:40:48.6089680Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6089794Z Ran 1 test in 5.612s 2023-01-11T22:40:48.6089814Z 2023-01-11T22:40:48.6089907Z OK 2023-01-11T22:40:48.6089926Z 2023-01-11T22:40:48.6090050Z Generating XML reports... 2023-01-11T22:40:48.6090444Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111222917.xml 2023-01-11T22:40:48.6090793Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6090967Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6091340Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6091587Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6091843Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd4bu4n5j 2023-01-11T22:40:48.6092114Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd4bu4n5j/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6092135Z 2023-01-11T22:40:48.6092243Z Running tests... 2023-01-11T22:40:48.6092511Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6092819Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6093023Z test_ddp_checkpointing_dynamic_module (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.6093388Z Dynamic module can be checkpointed, multiple times, with non-reentrant ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6093605Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 74238 2023-01-11T22:40:48.6093824Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 74239 2023-01-11T22:40:48.6094237Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6094418Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6094795Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6094982Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6095323Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6095499Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6095866Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6096051Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6096308Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf1axoe0u 2023-01-11T22:40:48.6096965Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf1axoe0u/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6097211Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6097462Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp30utapri 2023-01-11T22:40:48.6097731Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp30utapri/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6097939Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6098184Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6098426Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6098840Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6099234Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6099337Z ok (6.073s) 2023-01-11T22:40:48.6099357Z 2023-01-11T22:40:48.6099618Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6099732Z Ran 1 test in 6.073s 2023-01-11T22:40:48.6099752Z 2023-01-11T22:40:48.6099845Z OK 2023-01-11T22:40:48.6099864Z 2023-01-11T22:40:48.6099969Z Generating XML reports... 2023-01-11T22:40:48.6100426Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111222925.xml 2023-01-11T22:40:48.6100791Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6101076Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6101456Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6101650Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6101904Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp17e3wdmf 2023-01-11T22:40:48.6102171Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp17e3wdmf/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6102191Z 2023-01-11T22:40:48.6102282Z Running tests... 2023-01-11T22:40:48.6102544Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6102852Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6103091Z test_ddp_checkpointing_dynamic_weight_sharing (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.6103361Z Dynamic module can be checkpointed multiple times with weight sharing ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6103577Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 74353 2023-01-11T22:40:48.6103857Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 74354 2023-01-11T22:40:48.6104241Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6104416Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6104775Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6106501Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6106873Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6107050Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6107427Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6107619Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6107872Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpthojixgq 2023-01-11T22:40:48.6108142Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpthojixgq/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6108368Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6108606Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvplg85sq 2023-01-11T22:40:48.6108868Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvplg85sq/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6109094Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6109339Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6109583Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6109982Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6110374Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6110477Z ok (6.046s) 2023-01-11T22:40:48.6110496Z 2023-01-11T22:40:48.6110758Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6110853Z Ran 1 test in 6.047s 2023-01-11T22:40:48.6110872Z 2023-01-11T22:40:48.6110964Z OK 2023-01-11T22:40:48.6110983Z 2023-01-11T22:40:48.6111106Z Generating XML reports... 2023-01-11T22:40:48.6111561Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111222934.xml 2023-01-11T22:40:48.6111995Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6112174Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6112550Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6112740Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6112978Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpke3twckf 2023-01-11T22:40:48.6113246Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpke3twckf/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6113266Z 2023-01-11T22:40:48.6113373Z Running tests... 2023-01-11T22:40:48.6113633Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6113941Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6114183Z test_ddp_checkpointing_once_use_reentrant_False (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.6114481Z DDP works as expected when layer is checkpointed only once. ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6114705Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 74468 2023-01-11T22:40:48.6114899Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 74469 2023-01-11T22:40:48.6115272Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6115447Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6115825Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6116016Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6116388Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6116565Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6116936Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6117125Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6117362Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4mf9940w 2023-01-11T22:40:48.6117626Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4mf9940w/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6117852Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6118104Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb5vv3okp 2023-01-11T22:40:48.6118372Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb5vv3okp/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6118603Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6118848Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6119088Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6119488Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6119863Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6120098Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6120328Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6120607Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6120835Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6121743Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1911: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2023-01-11T22:40:48.6121857Z warnings.warn( 2023-01-11T22:40:48.6122759Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1911: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2023-01-11T22:40:48.6122875Z warnings.warn( 2023-01-11T22:40:48.6123107Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6123361Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6123593Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6123819Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6124049Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6124272Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6124493Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6124718Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6124823Z ok (6.107s) 2023-01-11T22:40:48.6124844Z 2023-01-11T22:40:48.6125093Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6125210Z Ran 1 test in 6.107s 2023-01-11T22:40:48.6125230Z 2023-01-11T22:40:48.6125321Z OK 2023-01-11T22:40:48.6125341Z 2023-01-11T22:40:48.6125463Z Generating XML reports... 2023-01-11T22:40:48.6125919Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111222942.xml 2023-01-11T22:40:48.6126285Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6126461Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6126834Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6127024Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6127265Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpot0t9jr8 2023-01-11T22:40:48.6127535Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpot0t9jr8/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6127556Z 2023-01-11T22:40:48.6127664Z Running tests... 2023-01-11T22:40:48.6127926Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6128234Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6128471Z test_ddp_checkpointing_once_use_reentrant_True (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.6128718Z DDP works as expected when layer is checkpointed only once. ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6128935Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 74583 2023-01-11T22:40:48.6129134Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 74584 2023-01-11T22:40:48.6129558Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6129737Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6130113Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6130304Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6130666Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6130837Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6131207Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6131395Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6131634Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpflcsjkz9 2023-01-11T22:40:48.6131904Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpflcsjkz9/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6132200Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8g9j16br 2023-01-11T22:40:48.6132471Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8g9j16br/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6132697Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6132919Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6133160Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6133400Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6133800Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6134181Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6134416Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6134647Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6134877Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6135105Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6136010Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1911: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2023-01-11T22:40:48.6136127Z warnings.warn( 2023-01-11T22:40:48.6137343Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1911: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2023-01-11T22:40:48.6137460Z warnings.warn( 2023-01-11T22:40:48.6137691Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6137898Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6138125Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6138352Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6138666Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6138893Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6139112Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6139337Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6139437Z ok (6.144s) 2023-01-11T22:40:48.6139457Z 2023-01-11T22:40:48.6139714Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6139828Z Ran 1 test in 6.144s 2023-01-11T22:40:48.6139847Z 2023-01-11T22:40:48.6139940Z OK 2023-01-11T22:40:48.6139959Z 2023-01-11T22:40:48.6140083Z Generating XML reports... 2023-01-11T22:40:48.6140544Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111222951.xml 2023-01-11T22:40:48.6140917Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6141155Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6141545Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6141734Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6141971Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpuktvfs5v 2023-01-11T22:40:48.6142239Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpuktvfs5v/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6142258Z 2023-01-11T22:40:48.6142367Z Running tests... 2023-01-11T22:40:48.6142630Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6142937Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6143204Z test_ddp_checkpointing_twice_static_graph_use_reentrant_False (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.6143554Z Regardless of reentrant or non-reentrant checkpointing impl, ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6143771Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 74698 2023-01-11T22:40:48.6143970Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 74699 2023-01-11T22:40:48.6144338Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6144512Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6144887Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6145077Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6145439Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6145610Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6145981Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6146171Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6146408Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfdepxoie 2023-01-11T22:40:48.6146678Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfdepxoie/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6146904Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6147157Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptljbsz5h 2023-01-11T22:40:48.6147428Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptljbsz5h/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6147713Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6147959Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6148201Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6148600Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6148975Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6149209Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6149442Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6149675Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6149904Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6150074Z ok (6.059s) 2023-01-11T22:40:48.6150096Z 2023-01-11T22:40:48.6150366Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6150477Z Ran 1 test in 6.059s 2023-01-11T22:40:48.6150496Z 2023-01-11T22:40:48.6150572Z OK 2023-01-11T22:40:48.6150590Z 2023-01-11T22:40:48.6150714Z Generating XML reports... 2023-01-11T22:40:48.6151170Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111222959.xml 2023-01-11T22:40:48.6151536Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6151712Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6152087Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6152285Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6152542Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5ghhzu1u 2023-01-11T22:40:48.6152810Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5ghhzu1u/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6152831Z 2023-01-11T22:40:48.6152922Z Running tests... 2023-01-11T22:40:48.6153184Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6153490Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6153749Z test_ddp_checkpointing_twice_static_graph_use_reentrant_True (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.6154093Z Regardless of reentrant or non-reentrant checkpointing impl, ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6154312Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 74813 2023-01-11T22:40:48.6154528Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 74814 2023-01-11T22:40:48.6154896Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6155055Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6155431Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6155620Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6155982Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6156155Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6156522Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6156766Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6157025Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmpzwgbuj 2023-01-11T22:40:48.6157295Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmpzwgbuj/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6157504Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6157753Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy0ew0zo3 2023-01-11T22:40:48.6158018Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy0ew0zo3/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6158243Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6158486Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6158731Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6159176Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6159575Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6159807Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6160023Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6160251Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6160475Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6160575Z ok (6.004s) 2023-01-11T22:40:48.6160595Z 2023-01-11T22:40:48.6160858Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6160976Z Ran 1 test in 6.004s 2023-01-11T22:40:48.6160995Z 2023-01-11T22:40:48.6161087Z OK 2023-01-11T22:40:48.6161106Z 2023-01-11T22:40:48.6161232Z Generating XML reports... 2023-01-11T22:40:48.6161672Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223007.xml 2023-01-11T22:40:48.6162041Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6162216Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6162587Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6162773Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6163029Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpuewp01pf 2023-01-11T22:40:48.6163300Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpuewp01pf/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6163320Z 2023-01-11T22:40:48.6163428Z Running tests... 2023-01-11T22:40:48.6163677Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6163987Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6164228Z test_ddp_checkpointing_twice_use_reentrant_False (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.6164601Z Checkpoitning twice fails for non-static graph with reentrant checkpoint ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6164821Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 74928 2023-01-11T22:40:48.6165036Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 74929 2023-01-11T22:40:48.6165401Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6165643Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6166025Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6166196Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6166555Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6166726Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6167095Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6167280Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6167530Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqptlctqt 2023-01-11T22:40:48.6167800Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqptlctqt/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6168028Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6168325Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6cr75xbm 2023-01-11T22:40:48.6168578Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6cr75xbm/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6168803Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6169043Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6169285Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6169686Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6170078Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6170317Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6170552Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6171382Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:40:48.6172162Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:40:48.6172399Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6172631Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6172715Z ok (6.181s) 2023-01-11T22:40:48.6172735Z 2023-01-11T22:40:48.6173006Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6173118Z Ran 1 test in 6.181s 2023-01-11T22:40:48.6173138Z 2023-01-11T22:40:48.6173231Z OK 2023-01-11T22:40:48.6173249Z 2023-01-11T22:40:48.6173372Z Generating XML reports... 2023-01-11T22:40:48.6173895Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223016.xml 2023-01-11T22:40:48.6174266Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6174442Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6174798Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6174986Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6175237Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpk2i57nio 2023-01-11T22:40:48.6175506Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpk2i57nio/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6175526Z 2023-01-11T22:40:48.6175636Z Running tests... 2023-01-11T22:40:48.6175899Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6176210Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6176494Z test_ddp_checkpointing_twice_use_reentrant_True (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.6177139Z Checkpoitning twice fails for non-static graph with reentrant checkpoint ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6177344Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 75043 2023-01-11T22:40:48.6177561Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 75044 2023-01-11T22:40:48.6177931Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6178106Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6178479Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6178675Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6179038Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6179209Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6179560Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6179749Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6180004Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpluih6skx 2023-01-11T22:40:48.6180273Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpluih6skx/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6180525Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpaybfko2u 2023-01-11T22:40:48.6180793Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpaybfko2u/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6181024Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6181246Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6181488Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6181710Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6182112Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6182505Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6182606Z ok (5.947s) 2023-01-11T22:40:48.6182626Z 2023-01-11T22:40:48.6182888Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6183090Z Ran 1 test in 5.947s 2023-01-11T22:40:48.6183111Z 2023-01-11T22:40:48.6183204Z OK 2023-01-11T22:40:48.6183223Z 2023-01-11T22:40:48.6183345Z Generating XML reports... 2023-01-11T22:40:48.6183812Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223024.xml 2023-01-11T22:40:48.6184161Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6184336Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6184712Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6184903Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6185156Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpx0qr_kp8 2023-01-11T22:40:48.6185426Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpx0qr_kp8/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6185446Z 2023-01-11T22:40:48.6185554Z Running tests... 2023-01-11T22:40:48.6185878Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6186179Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6186420Z test_ddp_checkpointing_twice_weight_sharing (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.6186687Z Checkpointing should work with static graph in the case of checkpointing ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6186902Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 75158 2023-01-11T22:40:48.6187116Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 75159 2023-01-11T22:40:48.6187484Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6187664Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6188044Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6188233Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6188579Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6188753Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6189129Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6189316Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6189569Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjym8wmu_ 2023-01-11T22:40:48.6189836Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjym8wmu_/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6190091Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpx9vj57ec 2023-01-11T22:40:48.6190319Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6190567Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpx9vj57ec/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6190793Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6191034Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6191274Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6191675Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6192068Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6192358Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6192593Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6192817Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6193030Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6193132Z ok (6.047s) 2023-01-11T22:40:48.6193152Z 2023-01-11T22:40:48.6193417Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6193528Z Ran 1 test in 6.047s 2023-01-11T22:40:48.6193548Z 2023-01-11T22:40:48.6193639Z OK 2023-01-11T22:40:48.6193658Z 2023-01-11T22:40:48.6193779Z Generating XML reports... 2023-01-11T22:40:48.6194234Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223032.xml 2023-01-11T22:40:48.6194607Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6194828Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6195197Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6195390Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6195643Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkfh79j9o 2023-01-11T22:40:48.6195910Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkfh79j9o/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6195930Z 2023-01-11T22:40:48.6196039Z Running tests... 2023-01-11T22:40:48.6196302Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6196608Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6196867Z test_ddp_checkpointing_unused_params_use_reentrant_False (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.6197120Z With reentrant autograd checkpointing impl, DDP will fail when there are ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6197338Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 75273 2023-01-11T22:40:48.6197550Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 75274 2023-01-11T22:40:48.6197921Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6198095Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6198472Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6198662Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6199022Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6199195Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6199546Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6199734Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6199988Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdnd6axtr 2023-01-11T22:40:48.6200255Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdnd6axtr/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6200481Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6200731Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9shwv3y5 2023-01-11T22:40:48.6201053Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9shwv3y5/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6201279Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6201507Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6201746Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6202147Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6202539Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6203372Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:40:48.6204155Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:40:48.6205068Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1911: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2023-01-11T22:40:48.6205185Z warnings.warn( 2023-01-11T22:40:48.6206083Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1911: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2023-01-11T22:40:48.6206194Z warnings.warn( 2023-01-11T22:40:48.6206428Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6206660Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6206898Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6207129Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6207212Z ok (6.052s) 2023-01-11T22:40:48.6207249Z 2023-01-11T22:40:48.6207500Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6207612Z Ran 1 test in 6.053s 2023-01-11T22:40:48.6207631Z 2023-01-11T22:40:48.6207722Z OK 2023-01-11T22:40:48.6207741Z 2023-01-11T22:40:48.6207863Z Generating XML reports... 2023-01-11T22:40:48.6208320Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223041.xml 2023-01-11T22:40:48.6208686Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6208860Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6209294Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6209467Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6209725Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnh0k8pv2 2023-01-11T22:40:48.6209992Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnh0k8pv2/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6210012Z 2023-01-11T22:40:48.6210121Z Running tests... 2023-01-11T22:40:48.6210382Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6210692Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6210943Z test_ddp_checkpointing_unused_params_use_reentrant_True (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.6211209Z With reentrant autograd checkpointing impl, DDP will fail when there are ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6211411Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 75388 2023-01-11T22:40:48.6211686Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 75389 2023-01-11T22:40:48.6212063Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6212240Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6212618Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6212807Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6213167Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6213339Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6213705Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6213878Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6214134Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy1dq__8_ 2023-01-11T22:40:48.6214400Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy1dq__8_/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6214626Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6214881Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpw2sl8ii0 2023-01-11T22:40:48.6215145Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpw2sl8ii0/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6215370Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6215612Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6215855Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6216239Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6216866Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6217793Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1911: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2023-01-11T22:40:48.6217907Z warnings.warn( 2023-01-11T22:40:48.6218812Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1911: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2023-01-11T22:40:48.6219010Z warnings.warn( 2023-01-11T22:40:48.6219245Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6219474Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6219703Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6219932Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6220015Z ok (6.049s) 2023-01-11T22:40:48.6220035Z 2023-01-11T22:40:48.6220303Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6220419Z Ran 1 test in 6.049s 2023-01-11T22:40:48.6220439Z 2023-01-11T22:40:48.6220531Z OK 2023-01-11T22:40:48.6220550Z 2023-01-11T22:40:48.6220673Z Generating XML reports... 2023-01-11T22:40:48.6221195Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223049.xml 2023-01-11T22:40:48.6221579Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6221754Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6222133Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6222305Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6222561Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp_hqyi9_ 2023-01-11T22:40:48.6222832Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp_hqyi9_/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6222857Z 2023-01-11T22:40:48.6222967Z Running tests... 2023-01-11T22:40:48.6223236Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6223546Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6223805Z test_ddp_checkpointing_weight_sharing_use_reentrant_False (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.6224040Z Test that checkpointing with weight sharing works. ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6224239Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 75503 2023-01-11T22:40:48.6224455Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 75504 2023-01-11T22:40:48.6224824Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6225000Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6225380Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6225572Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6225933Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6226105Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6226478Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6226646Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6226899Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpauonbl4w 2023-01-11T22:40:48.6227167Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpauonbl4w/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6227467Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6227723Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1a28hg3g 2023-01-11T22:40:48.6227989Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1a28hg3g/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6228214Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6228457Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6228678Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6229080Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6229470Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6229705Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6229985Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6230221Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6230451Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6230553Z ok (6.050s) 2023-01-11T22:40:48.6230573Z 2023-01-11T22:40:48.6230840Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6230934Z Ran 1 test in 6.050s 2023-01-11T22:40:48.6230954Z 2023-01-11T22:40:48.6231047Z OK 2023-01-11T22:40:48.6231066Z 2023-01-11T22:40:48.6231188Z Generating XML reports... 2023-01-11T22:40:48.6231645Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223058.xml 2023-01-11T22:40:48.6232017Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6232194Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6232567Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6232755Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6233008Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvm8o1bc9 2023-01-11T22:40:48.6233256Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvm8o1bc9/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6233276Z 2023-01-11T22:40:48.6233386Z Running tests... 2023-01-11T22:40:48.6233651Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6233960Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6234219Z test_ddp_checkpointing_weight_sharing_use_reentrant_True (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.6234460Z Test that checkpointing with weight sharing works. ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6234677Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 75618 2023-01-11T22:40:48.6234892Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 75619 2023-01-11T22:40:48.6235238Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6235413Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6235788Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6235980Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6236347Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6236575Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6236953Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6237139Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6237393Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpawsnzad1 2023-01-11T22:40:48.6237645Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpawsnzad1/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6237869Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6238120Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa9izi6oj 2023-01-11T22:40:48.6238385Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa9izi6oj/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6238616Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6238904Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6239151Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6239551Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6239924Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6240156Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6240389Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6240615Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6240848Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6241081Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6241307Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6241531Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6241751Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6241834Z ok (6.024s) 2023-01-11T22:40:48.6241853Z 2023-01-11T22:40:48.6242118Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6242229Z Ran 1 test in 6.024s 2023-01-11T22:40:48.6242248Z 2023-01-11T22:40:48.6242341Z OK 2023-01-11T22:40:48.6242359Z 2023-01-11T22:40:48.6242483Z Generating XML reports... 2023-01-11T22:40:48.6242943Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223106.xml 2023-01-11T22:40:48.6243314Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6243491Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6243852Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6244043Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6244297Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi2xra0ax 2023-01-11T22:40:48.6244566Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi2xra0ax/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6244586Z 2023-01-11T22:40:48.6244694Z Running tests... 2023-01-11T22:40:48.6244958Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6245326Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6245551Z test_ddp_comm_hook_future_passing_cpu (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.6245812Z This unit test verifies whether the Future object is passed properly. ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6246012Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 75733 2023-01-11T22:40:48.6246225Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 75734 2023-01-11T22:40:48.6246595Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6246770Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6247145Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6247337Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6247695Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6247911Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6248275Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6248460Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6248711Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7su3nycz 2023-01-11T22:40:48.6248979Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7su3nycz/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6249204Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6249454Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzll849os 2023-01-11T22:40:48.6249725Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzll849os/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6249952Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6250195Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6250419Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6250819Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6251211Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6251312Z ok (3.951s) 2023-01-11T22:40:48.6251332Z 2023-01-11T22:40:48.6251594Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6251710Z Ran 1 test in 3.952s 2023-01-11T22:40:48.6251729Z 2023-01-11T22:40:48.6251821Z OK 2023-01-11T22:40:48.6251840Z 2023-01-11T22:40:48.6251964Z Generating XML reports... 2023-01-11T22:40:48.6252402Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223114.xml 2023-01-11T22:40:48.6252771Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6252945Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6253320Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6253513Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6253766Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7gtjws74 2023-01-11T22:40:48.6254035Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7gtjws74/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6254105Z 2023-01-11T22:40:48.6254218Z Running tests... 2023-01-11T22:40:48.6254485Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6254775Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6255003Z test_ddp_comm_hook_future_passing_gpu_gloo (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.6255289Z This unit test verifies whether the Future object is passed properly using gloo backend. ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6255502Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 75846 2023-01-11T22:40:48.6255716Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 75847 2023-01-11T22:40:48.6256082Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6256263Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6256870Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6257146Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6257515Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6257693Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6258063Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6258249Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6258505Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm4juqxs5 2023-01-11T22:40:48.6258775Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm4juqxs5/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6259033Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpr34fmdey 2023-01-11T22:40:48.6259300Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpr34fmdey/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6259509Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6259736Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6259979Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6260217Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6260618Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6261012Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6261117Z ok (5.474s) 2023-01-11T22:40:48.6261137Z 2023-01-11T22:40:48.6261399Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6261515Z Ran 1 test in 5.474s 2023-01-11T22:40:48.6261535Z 2023-01-11T22:40:48.6261608Z OK 2023-01-11T22:40:48.6261627Z 2023-01-11T22:40:48.6261751Z Generating XML reports... 2023-01-11T22:40:48.6262208Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223121.xml 2023-01-11T22:40:48.6262575Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6262750Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6263125Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6263315Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6263646Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkl3p3teo 2023-01-11T22:40:48.6263918Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkl3p3teo/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6263939Z 2023-01-11T22:40:48.6264032Z Running tests... 2023-01-11T22:40:48.6264294Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6264603Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6264821Z test_ddp_comm_hook_register_just_once (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.6265099Z DDP communication hook can only be registered once. This test validates whether ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6265312Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 75961 2023-01-11T22:40:48.6265527Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 75962 2023-01-11T22:40:48.6265897Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6266103Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6266487Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6266676Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6267038Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6267212Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6267580Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6267767Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6268024Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbo80fh5m 2023-01-11T22:40:48.6268277Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbo80fh5m/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6268504Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6268759Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd70zpg90 2023-01-11T22:40:48.6269024Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd70zpg90/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6269248Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6269492Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6269734Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6270134Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6270528Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6270612Z ok (3.955s) 2023-01-11T22:40:48.6270650Z 2023-01-11T22:40:48.6270948Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6271062Z Ran 1 test in 3.956s 2023-01-11T22:40:48.6271082Z 2023-01-11T22:40:48.6271175Z OK 2023-01-11T22:40:48.6271194Z 2023-01-11T22:40:48.6271317Z Generating XML reports... 2023-01-11T22:40:48.6271777Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223128.xml 2023-01-11T22:40:48.6272140Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6272315Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6272754Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6272929Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6273181Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9msh03zx 2023-01-11T22:40:48.6273449Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9msh03zx/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6273469Z 2023-01-11T22:40:48.6273576Z Running tests... 2023-01-11T22:40:48.6273840Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6274148Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6274369Z test_ddp_comm_hook_sparse_gradients (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.6274639Z Runs "test_sparse_gradients" unit test with DDP communication hook. We define a ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6274841Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76070 2023-01-11T22:40:48.6275107Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76071 2023-01-11T22:40:48.6275485Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6275660Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6276037Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6276227Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6276591Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6276762Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6277138Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6277308Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6277562Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpr045wbq5 2023-01-11T22:40:48.6277824Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpr045wbq5/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6278073Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8mg_2cc_ 2023-01-11T22:40:48.6278335Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8mg_2cc_/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6278564Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6278791Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6279033Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6279258Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6279658Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6280048Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6280148Z ok (3.948s) 2023-01-11T22:40:48.6280168Z 2023-01-11T22:40:48.6280429Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6280541Z Ran 1 test in 3.948s 2023-01-11T22:40:48.6280560Z 2023-01-11T22:40:48.6280652Z OK 2023-01-11T22:40:48.6280671Z 2023-01-11T22:40:48.6280794Z Generating XML reports... 2023-01-11T22:40:48.6281246Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223135.xml 2023-01-11T22:40:48.6281654Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6281835Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6282210Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6282400Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6282653Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi7i56cxd 2023-01-11T22:40:48.6282918Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi7i56cxd/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6282937Z 2023-01-11T22:40:48.6283044Z Running tests... 2023-01-11T22:40:48.6283303Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6283610Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6283806Z test_ddp_invalid_comm_hook_init (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.6284124Z This unit test makes sure that register_comm_hook properly checks the format ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6284349Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76213 2023-01-11T22:40:48.6284562Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76214 2023-01-11T22:40:48.6284931Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6285104Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6285476Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6285666Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6286006Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6286181Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6286552Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6286741Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6286993Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptjm4xo0l 2023-01-11T22:40:48.6287257Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptjm4xo0l/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6287510Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpspk8_z18 2023-01-11T22:40:48.6287772Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpspk8_z18/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6287998Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6288204Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6288448Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6288689Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6289088Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6289485Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6289587Z ok (3.966s) 2023-01-11T22:40:48.6289606Z 2023-01-11T22:40:48.6289867Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6289979Z Ran 1 test in 3.966s 2023-01-11T22:40:48.6289998Z 2023-01-11T22:40:48.6290074Z OK 2023-01-11T22:40:48.6290209Z 2023-01-11T22:40:48.6290320Z Generating XML reports... 2023-01-11T22:40:48.6290776Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223141.xml 2023-01-11T22:40:48.6291147Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6291323Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6291698Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6291888Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6292141Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzhsqd_xb 2023-01-11T22:40:48.6292408Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzhsqd_xb/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6292428Z 2023-01-11T22:40:48.6292521Z Running tests... 2023-01-11T22:40:48.6292782Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6293089Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6293356Z test_ddp_invalid_comm_hook_return_type (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.6293639Z This test checks whether return annotation checked properly if defined. It also ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6293855Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76322 2023-01-11T22:40:48.6294071Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76323 2023-01-11T22:40:48.6294442Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6294600Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6294977Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6295168Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6295533Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6295706Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6296074Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6296260Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6296513Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6xow0suy 2023-01-11T22:40:48.6297023Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6xow0suy/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6297236Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6297490Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpinzsvudb 2023-01-11T22:40:48.6297758Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpinzsvudb/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6297983Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6298224Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6298465Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6298869Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6299262Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6299363Z ok (3.927s) 2023-01-11T22:40:48.6299461Z 2023-01-11T22:40:48.6299716Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6299829Z Ran 1 test in 3.927s 2023-01-11T22:40:48.6299848Z 2023-01-11T22:40:48.6299940Z OK 2023-01-11T22:40:48.6299962Z 2023-01-11T22:40:48.6300084Z Generating XML reports... 2023-01-11T22:40:48.6300540Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223147.xml 2023-01-11T22:40:48.6300904Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6301080Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6301453Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6301626Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6301877Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6klmg7nf 2023-01-11T22:40:48.6302146Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6klmg7nf/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6302165Z 2023-01-11T22:40:48.6302336Z Running tests... 2023-01-11T22:40:48.6302608Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6302914Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6303156Z test_find_unused_parameters_when_unused_parameters_empty (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.6303416Z An empty unused_parameters array does not imply find_unused_parameters = ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6303634Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76435 2023-01-11T22:40:48.6303833Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76436 2023-01-11T22:40:48.6304194Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6304373Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6304754Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6304943Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6305299Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6305471Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6305839Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6306006Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6306257Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphne72xgt 2023-01-11T22:40:48.6306520Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphne72xgt/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6306767Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprcgqp2sm 2023-01-11T22:40:48.6307031Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprcgqp2sm/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6307255Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6307474Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6307711Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6307945Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6308322Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6308776Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6309556Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:40:48.6309662Z ok (5.529s) 2023-01-11T22:40:48.6309681Z 2023-01-11T22:40:48.6309941Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6310051Z Ran 1 test in 5.529s 2023-01-11T22:40:48.6310071Z 2023-01-11T22:40:48.6310158Z OK 2023-01-11T22:40:48.6310180Z 2023-01-11T22:40:48.6310300Z Generating XML reports... 2023-01-11T22:40:48.6310746Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223153.xml 2023-01-11T22:40:48.6311154Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6311334Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6311694Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6311882Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6312129Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp978l3v_k 2023-01-11T22:40:48.6312394Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp978l3v_k/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6312414Z 2023-01-11T22:40:48.6312525Z Running tests... 2023-01-11T22:40:48.6312782Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6313084Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6313363Z test_global_local_unused_params_grad (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6313567Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76550 2023-01-11T22:40:48.6313781Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76551 2023-01-11T22:40:48.6314141Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6314312Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6314679Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6314868Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6315222Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6315391Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6315757Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6315926Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6316169Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmps061b797 2023-01-11T22:40:48.6316424Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmps061b797/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6316644Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6316889Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1t9otix3 2023-01-11T22:40:48.6317211Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1t9otix3/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6317441Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6317677Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6317899Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6318297Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6318680Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6318781Z ok (5.529s) 2023-01-11T22:40:48.6318801Z 2023-01-11T22:40:48.6319058Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6319172Z Ran 1 test in 5.529s 2023-01-11T22:40:48.6319192Z 2023-01-11T22:40:48.6319283Z OK 2023-01-11T22:40:48.6319302Z 2023-01-11T22:40:48.6319417Z Generating XML reports... 2023-01-11T22:40:48.6319905Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223201.xml 2023-01-11T22:40:48.6320262Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6320435Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6320805Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6320994Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6321247Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfcuazq3t 2023-01-11T22:40:48.6321509Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfcuazq3t/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6321533Z 2023-01-11T22:40:48.6321637Z Running tests... 2023-01-11T22:40:48.6321896Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6322189Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6322495Z test_global_local_unused_params_grad_with_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6322709Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76665 2023-01-11T22:40:48.6322918Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76666 2023-01-11T22:40:48.6323277Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6323445Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6323813Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6324000Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6324351Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6324504Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6324860Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6325034Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6325277Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpze31k4bj 2023-01-11T22:40:48.6325533Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpze31k4bj/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6325750Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6326046Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp11w1refk 2023-01-11T22:40:48.6326304Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp11w1refk/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6326518Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6326739Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6326970Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6327356Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6327734Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6327826Z ok (5.509s) 2023-01-11T22:40:48.6327849Z 2023-01-11T22:40:48.6328100Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6328202Z Ran 1 test in 5.509s 2023-01-11T22:40:48.6328222Z 2023-01-11T22:40:48.6328303Z OK 2023-01-11T22:40:48.6328366Z 2023-01-11T22:40:48.6328476Z Generating XML reports... 2023-01-11T22:40:48.6328932Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223209.xml 2023-01-11T22:40:48.6329293Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6329460Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6329828Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6330005Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6330245Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpak02lntd 2023-01-11T22:40:48.6330506Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpak02lntd/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6330526Z 2023-01-11T22:40:48.6330626Z Running tests... 2023-01-11T22:40:48.6330871Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6331168Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6331465Z test_global_local_unused_params_grad_with_static_graph (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6331671Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76780 2023-01-11T22:40:48.6331873Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76781 2023-01-11T22:40:48.6332230Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6332397Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6332762Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6332949Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6333293Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6333454Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6333811Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6333987Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6334237Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpce248vlj 2023-01-11T22:40:48.6334493Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpce248vlj/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6334792Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl5wjwc6o 2023-01-11T22:40:48.6335057Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl5wjwc6o/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6335266Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6335483Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6335718Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6335954Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6336348Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6336967Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6337956Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1911: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2023-01-11T22:40:48.6338068Z warnings.warn( 2023-01-11T22:40:48.6338972Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1911: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2023-01-11T22:40:48.6339072Z warnings.warn( 2023-01-11T22:40:48.6339173Z ok (5.538s) 2023-01-11T22:40:48.6339193Z 2023-01-11T22:40:48.6339439Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6339542Z Ran 1 test in 5.539s 2023-01-11T22:40:48.6339561Z 2023-01-11T22:40:48.6339652Z OK 2023-01-11T22:40:48.6339671Z 2023-01-11T22:40:48.6339781Z Generating XML reports... 2023-01-11T22:40:48.6340231Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223217.xml 2023-01-11T22:40:48.6340590Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6340764Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6341138Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6341310Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6341566Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpshix93on 2023-01-11T22:40:48.6341836Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpshix93on/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6341858Z 2023-01-11T22:40:48.6341964Z Running tests... 2023-01-11T22:40:48.6342220Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6342529Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6342830Z test_gloo_backend_1gpu_module_device_ids_integer_list (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6343045Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76895 2023-01-11T22:40:48.6343254Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76896 2023-01-11T22:40:48.6343604Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6343855Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6344232Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6344419Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6344772Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6344940Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6345303Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6345487Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6345722Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqex4dn05 2023-01-11T22:40:48.6345983Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqex4dn05/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6346233Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqubvwynv 2023-01-11T22:40:48.6346542Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqubvwynv/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6346773Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6346996Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6347235Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6347470Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6347855Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6348228Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6348455Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6348685Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6348784Z ok (5.913s) 2023-01-11T22:40:48.6348804Z 2023-01-11T22:40:48.6349061Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6349170Z Ran 1 test in 5.914s 2023-01-11T22:40:48.6349190Z 2023-01-11T22:40:48.6349276Z OK 2023-01-11T22:40:48.6349295Z 2023-01-11T22:40:48.6349412Z Generating XML reports... 2023-01-11T22:40:48.6349866Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223225.xml 2023-01-11T22:40:48.6350216Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6350389Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6350758Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6350946Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6351196Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdvjxw3bi 2023-01-11T22:40:48.6351459Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdvjxw3bi/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6351479Z 2023-01-11T22:40:48.6351580Z Running tests... 2023-01-11T22:40:48.6351834Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6352122Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6352424Z test_gloo_backend_1gpu_module_device_ids_torch_device_list (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6352708Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77012 2023-01-11T22:40:48.6352916Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77013 2023-01-11T22:40:48.6353284Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6353450Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6353817Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6353998Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6354350Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6354503Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6354869Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6355053Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6355360Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyjm7sp2j 2023-01-11T22:40:48.6355634Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyjm7sp2j/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6355850Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6356091Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7nkn_x3y 2023-01-11T22:40:48.6356344Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7nkn_x3y/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6356552Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6356784Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6357023Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6357424Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6357808Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6358035Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6358255Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6358350Z ok (6.050s) 2023-01-11T22:40:48.6358371Z 2023-01-11T22:40:48.6358622Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6358717Z Ran 1 test in 6.050s 2023-01-11T22:40:48.6358736Z 2023-01-11T22:40:48.6358825Z OK 2023-01-11T22:40:48.6358847Z 2023-01-11T22:40:48.6358958Z Generating XML reports... 2023-01-11T22:40:48.6359407Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223233.xml 2023-01-11T22:40:48.6359765Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6359932Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6360296Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6360481Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6360723Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqmtkunaw 2023-01-11T22:40:48.6360972Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqmtkunaw/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6360993Z 2023-01-11T22:40:48.6361148Z Running tests... 2023-01-11T22:40:48.6361403Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6361702Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6361967Z test_gloo_backend_2gpu_module (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6362176Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77129 2023-01-11T22:40:48.6362379Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77130 2023-01-11T22:40:48.6362742Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6362900Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6363269Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6363448Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6363807Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6364017Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6364396Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6364573Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6364821Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0sl5x48b 2023-01-11T22:40:48.6365082Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0sl5x48b/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6365291Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6365536Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpx8g4xls0 2023-01-11T22:40:48.6365800Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpx8g4xls0/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6366020Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6366169Z skip: Need at least 4 CUDA devices (3.950s) 2023-01-11T22:40:48.6366189Z 2023-01-11T22:40:48.6366444Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6366550Z Ran 1 test in 3.950s 2023-01-11T22:40:48.6366570Z 2023-01-11T22:40:48.6366666Z OK (skipped=1) 2023-01-11T22:40:48.6366685Z 2023-01-11T22:40:48.6366790Z Generating XML reports... 2023-01-11T22:40:48.6367243Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223241.xml 2023-01-11T22:40:48.6367599Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6367768Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6368204Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6368387Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6368627Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxo1qo0mo 2023-01-11T22:40:48.6368887Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxo1qo0mo/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6368907Z 2023-01-11T22:40:48.6369008Z Running tests... 2023-01-11T22:40:48.6369254Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6369553Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6369814Z test_gloo_backend_4gpu_module (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6370018Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77232 2023-01-11T22:40:48.6370289Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77233 2023-01-11T22:40:48.6370649Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6370862Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6371233Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6371404Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6371760Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6371922Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6372283Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6372465Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6372757Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9lbvtnl7 2023-01-11T22:40:48.6373023Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9lbvtnl7/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6373241Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6373479Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_txsmhov 2023-01-11T22:40:48.6373726Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_txsmhov/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6373940Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6374079Z skip: Need at least 8 CUDA devices (3.964s) 2023-01-11T22:40:48.6374099Z 2023-01-11T22:40:48.6374353Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6374464Z Ran 1 test in 3.964s 2023-01-11T22:40:48.6374483Z 2023-01-11T22:40:48.6374579Z OK (skipped=1) 2023-01-11T22:40:48.6374598Z 2023-01-11T22:40:48.6374715Z Generating XML reports... 2023-01-11T22:40:48.6375162Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223248.xml 2023-01-11T22:40:48.6375517Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6375674Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6376046Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6376227Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6376469Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnd0dfcc8 2023-01-11T22:40:48.6376971Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnd0dfcc8/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6376993Z 2023-01-11T22:40:48.6377094Z Running tests... 2023-01-11T22:40:48.6377359Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6377656Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6377911Z test_gloo_backend_cpu_module (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6378118Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77335 2023-01-11T22:40:48.6378324Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77336 2023-01-11T22:40:48.6378679Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6378841Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6379308Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6379491Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6379843Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6380004Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6380353Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6380527Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6380767Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_xiibtxr 2023-01-11T22:40:48.6381023Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_xiibtxr/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6381275Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxz3bnf03 2023-01-11T22:40:48.6381529Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxz3bnf03/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6381805Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6382033Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6382259Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6382488Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6382881Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6383266Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6383494Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6383716Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6383809Z ok (4.053s) 2023-01-11T22:40:48.6383829Z 2023-01-11T22:40:48.6384081Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6384181Z Ran 1 test in 4.053s 2023-01-11T22:40:48.6384201Z 2023-01-11T22:40:48.6384275Z OK 2023-01-11T22:40:48.6384293Z 2023-01-11T22:40:48.6384407Z Generating XML reports... 2023-01-11T22:40:48.6384849Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223254.xml 2023-01-11T22:40:48.6385203Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6385370Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6385736Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6385916Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6386160Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1axidcdy 2023-01-11T22:40:48.6386411Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1axidcdy/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6386438Z 2023-01-11T22:40:48.6386528Z Running tests... 2023-01-11T22:40:48.6386780Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6387077Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6387361Z test_gloo_backend_cpu_module_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6387570Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77450 2023-01-11T22:40:48.6387832Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77451 2023-01-11T22:40:48.6388194Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6388363Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6388718Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6388899Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6389254Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6389415Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6389775Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6389952Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6390201Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0xdvkuy3 2023-01-11T22:40:48.6390512Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0xdvkuy3/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6390726Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6390969Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb846mjux 2023-01-11T22:40:48.6391225Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb846mjux/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6391438Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6391678Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6391908Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6392309Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6392695Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6392922Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6393135Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6393230Z ok (3.922s) 2023-01-11T22:40:48.6393249Z 2023-01-11T22:40:48.6393514Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6393620Z Ran 1 test in 3.922s 2023-01-11T22:40:48.6393639Z 2023-01-11T22:40:48.6393720Z OK 2023-01-11T22:40:48.6393739Z 2023-01-11T22:40:48.6393858Z Generating XML reports... 2023-01-11T22:40:48.6394302Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223300.xml 2023-01-11T22:40:48.6394667Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6394831Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6395192Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6395377Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6395623Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpc10e88fj 2023-01-11T22:40:48.6395877Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpc10e88fj/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6395897Z 2023-01-11T22:40:48.6395996Z Running tests... 2023-01-11T22:40:48.6396248Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6396612Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6396796Z test_ignored_output (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.6397036Z Test that the output of a model can be ignored and that there is no ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6397249Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77565 2023-01-11T22:40:48.6397455Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77566 2023-01-11T22:40:48.6397820Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6397982Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6398346Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6398529Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6398877Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6399091Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6399451Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6399628Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6399880Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplnky6dxu 2023-01-11T22:40:48.6400143Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplnky6dxu/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6400367Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6400617Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpetd153iw 2023-01-11T22:40:48.6400883Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpetd153iw/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6401108Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6401335Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6401573Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6401970Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6402361Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6402597Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6402825Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6402926Z ok (3.912s) 2023-01-11T22:40:48.6402946Z 2023-01-11T22:40:48.6403193Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6403300Z Ran 1 test in 3.912s 2023-01-11T22:40:48.6403323Z 2023-01-11T22:40:48.6403399Z OK 2023-01-11T22:40:48.6403418Z 2023-01-11T22:40:48.6403540Z Generating XML reports... 2023-01-11T22:40:48.6403988Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223306.xml 2023-01-11T22:40:48.6404345Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6404520Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6404890Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6405076Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6405378Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7kbfy4fh 2023-01-11T22:40:48.6405632Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7kbfy4fh/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6405658Z 2023-01-11T22:40:48.6405749Z Running tests... 2023-01-11T22:40:48.6406012Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6406316Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6406540Z test_ignored_output_with_unused_parameters (__main__.DistributedDataParallelTest) 2023-01-11T22:40:48.6406785Z Test that the output of a model can be ignored and that there is no ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6406991Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77708 2023-01-11T22:40:48.6407201Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77709 2023-01-11T22:40:48.6407559Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6407715Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6408132Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6408322Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6408674Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6408843Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6409206Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6409386Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6409629Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8qjcgcey 2023-01-11T22:40:48.6409883Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8qjcgcey/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6410130Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmhnhgqz6 2023-01-11T22:40:48.6410387Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmhnhgqz6/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6410608Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6410823Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6411065Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6411299Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6411683Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6412073Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6412161Z ok (3.967s) 2023-01-11T22:40:48.6412190Z 2023-01-11T22:40:48.6412434Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6412545Z Ran 1 test in 3.967s 2023-01-11T22:40:48.6412564Z 2023-01-11T22:40:48.6412649Z OK 2023-01-11T22:40:48.6412668Z 2023-01-11T22:40:48.6412780Z Generating XML reports... 2023-01-11T22:40:48.6413228Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223313.xml 2023-01-11T22:40:48.6413581Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6413749Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6414118Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6414363Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6414611Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfycqz3yh 2023-01-11T22:40:48.6414874Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfycqz3yh/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6414894Z 2023-01-11T22:40:48.6414993Z Running tests... 2023-01-11T22:40:48.6415255Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6415553Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6415818Z test_ignored_sharded_tensor (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6416026Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77851 2023-01-11T22:40:48.6416227Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77852 2023-01-11T22:40:48.6416817Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6417077Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6417467Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6417654Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6418008Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6418169Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6418537Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6418721Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6418963Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqxh6h7q_ 2023-01-11T22:40:48.6419223Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqxh6h7q_/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6419471Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnan2vecx 2023-01-11T22:40:48.6419731Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnan2vecx/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6419948Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6420170Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6420400Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6420635Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6421020Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6421410Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6421502Z ok (5.518s) 2023-01-11T22:40:48.6421522Z 2023-01-11T22:40:48.6421771Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6421879Z Ran 1 test in 5.519s 2023-01-11T22:40:48.6421899Z 2023-01-11T22:40:48.6421986Z OK 2023-01-11T22:40:48.6422005Z 2023-01-11T22:40:48.6422117Z Generating XML reports... 2023-01-11T22:40:48.6422567Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223319.xml 2023-01-11T22:40:48.6422925Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6423160Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6423534Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6423717Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6423964Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp65kkuajn 2023-01-11T22:40:48.6424220Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp65kkuajn/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6424240Z 2023-01-11T22:40:48.6424346Z Running tests... 2023-01-11T22:40:48.6424599Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6424899Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6425161Z test_invalid_powerSGD_state (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6425365Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77962 2023-01-11T22:40:48.6425572Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77963 2023-01-11T22:40:48.6426007Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6426185Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6426551Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6426733Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6427080Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6427245Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6427592Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6427773Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6428025Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi38pfavl 2023-01-11T22:40:48.6428283Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi38pfavl/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6428521Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5exioon5 2023-01-11T22:40:48.6428780Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5exioon5/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6428994Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6429526Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T22:40:48.6430067Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T22:40:48.6430586Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T22:40:48.6431115Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T22:40:48.6431697Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T22:40:48.6432227Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T22:40:48.6432443Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6433009Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T22:40:48.6433535Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T22:40:48.6434055Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T22:40:48.6434589Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T22:40:48.6435110Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T22:40:48.6435637Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T22:40:48.6435732Z ok (3.951s) 2023-01-11T22:40:48.6435754Z 2023-01-11T22:40:48.6436010Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6436117Z Ran 1 test in 3.951s 2023-01-11T22:40:48.6436136Z 2023-01-11T22:40:48.6436218Z OK 2023-01-11T22:40:48.6436238Z 2023-01-11T22:40:48.6436356Z Generating XML reports... 2023-01-11T22:40:48.6436805Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223327.xml 2023-01-11T22:40:48.6437165Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6437336Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6437709Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6437949Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6438186Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_bphcbr6 2023-01-11T22:40:48.6438449Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_bphcbr6/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6438469Z 2023-01-11T22:40:48.6438576Z Running tests... 2023-01-11T22:40:48.6438838Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6439145Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6439409Z test_save_load_checkpoint (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6439623Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78065 2023-01-11T22:40:48.6439835Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 78066 2023-01-11T22:40:48.6440186Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6440402Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6440787Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6440974Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6441336Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6441506Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6441865Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6442045Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6442286Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpiuz_d2yy 2023-01-11T22:40:48.6442555Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpiuz_d2yy/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6442780Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6443026Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_5r8jm77 2023-01-11T22:40:48.6443289Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_5r8jm77/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6443514Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6443756Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6443991Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6444391Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6444772Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6445003Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6445233Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6445461Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6445689Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6445790Z ok (6.037s) 2023-01-11T22:40:48.6445810Z 2023-01-11T22:40:48.6446070Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6446180Z Ran 1 test in 6.037s 2023-01-11T22:40:48.6446199Z 2023-01-11T22:40:48.6446342Z OK 2023-01-11T22:40:48.6446361Z 2023-01-11T22:40:48.6446467Z Generating XML reports... 2023-01-11T22:40:48.6446931Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223333.xml 2023-01-11T22:40:48.6447295Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6447466Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6447837Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6448024Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6448273Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7qwjevlb 2023-01-11T22:40:48.6448539Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7qwjevlb/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6448562Z 2023-01-11T22:40:48.6448655Z Running tests... 2023-01-11T22:40:48.6448915Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6449265Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6449535Z test_sparse_gradients (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6449752Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78180 2023-01-11T22:40:48.6449967Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 78181 2023-01-11T22:40:48.6450332Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6450503Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6450875Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6451051Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6451411Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6451581Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6451948Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6452136Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6452389Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl_9ay3xv 2023-01-11T22:40:48.6452652Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl_9ay3xv/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6452896Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp89w2whf2 2023-01-11T22:40:48.6453142Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp89w2whf2/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6453371Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6453597Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6453841Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6454079Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6454476Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6454867Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6454967Z ok (3.954s) 2023-01-11T22:40:48.6454987Z 2023-01-11T22:40:48.6455243Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6455394Z Ran 1 test in 3.954s 2023-01-11T22:40:48.6455414Z 2023-01-11T22:40:48.6455507Z OK 2023-01-11T22:40:48.6455526Z 2023-01-11T22:40:48.6455646Z Generating XML reports... 2023-01-11T22:40:48.6456104Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223341.xml 2023-01-11T22:40:48.6456468Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6456872Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6457266Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6457452Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6457707Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplycsgo_t 2023-01-11T22:40:48.6457960Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplycsgo_t/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6457984Z 2023-01-11T22:40:48.6458093Z Running tests... 2023-01-11T22:40:48.6458428Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6458751Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6459037Z test_sparse_gradients_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6459257Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78323 2023-01-11T22:40:48.6459471Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 78324 2023-01-11T22:40:48.6459834Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6459993Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6460367Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6460559Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6460920Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6461091Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6461459Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6461643Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6461896Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphbfmu_br 2023-01-11T22:40:48.6462160Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphbfmu_br/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6462369Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6462623Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2w8x_k6y 2023-01-11T22:40:48.6462890Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2w8x_k6y/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6463116Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6463354Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6463593Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6463989Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6464383Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6464467Z ok (4.009s) 2023-01-11T22:40:48.6464570Z 2023-01-11T22:40:48.6464825Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6464936Z Ran 1 test in 4.009s 2023-01-11T22:40:48.6464956Z 2023-01-11T22:40:48.6465047Z OK 2023-01-11T22:40:48.6465070Z 2023-01-11T22:40:48.6465191Z Generating XML reports... 2023-01-11T22:40:48.6465647Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223348.xml 2023-01-11T22:40:48.6466013Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6466186Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6466558Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6466732Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6466987Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpijh2sup1 2023-01-11T22:40:48.6467252Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpijh2sup1/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6467272Z 2023-01-11T22:40:48.6467421Z Running tests... 2023-01-11T22:40:48.6467687Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6467993Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6468264Z test_sync_batch_norm_empty_input (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6468477Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78466 2023-01-11T22:40:48.6468675Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 78467 2023-01-11T22:40:48.6469036Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6469209Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6469574Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6469757Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6470113Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6470285Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6470651Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6470880Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6471119Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6kpf33c1 2023-01-11T22:40:48.6471381Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6kpf33c1/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6471610Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6471862Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpe7o0uvav 2023-01-11T22:40:48.6472125Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpe7o0uvav/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6472347Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6472588Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6472830Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6473222Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6473596Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6473904Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6474139Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6474238Z ok (6.867s) 2023-01-11T22:40:48.6474258Z 2023-01-11T22:40:48.6474527Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6474635Z Ran 1 test in 6.867s 2023-01-11T22:40:48.6474654Z 2023-01-11T22:40:48.6474745Z OK 2023-01-11T22:40:48.6474764Z 2023-01-11T22:40:48.6474885Z Generating XML reports... 2023-01-11T22:40:48.6475324Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223354.xml 2023-01-11T22:40:48.6475689Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6475862Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6476239Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6476472Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6476727Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9nza0ckf 2023-01-11T22:40:48.6476993Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9nza0ckf/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6477013Z 2023-01-11T22:40:48.6477119Z Running tests... 2023-01-11T22:40:48.6477380Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6477670Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6477948Z test_sync_batch_norm_only_empty_input (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6478164Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78581 2023-01-11T22:40:48.6478382Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 78582 2023-01-11T22:40:48.6478747Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6478921Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6479295Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6479483Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6479823Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6479997Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6480364Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6480551Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6480810Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxq6thjo7 2023-01-11T22:40:48.6481077Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxq6thjo7/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6481329Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqkm_e63w 2023-01-11T22:40:48.6481589Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqkm_e63w/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6481815Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6482024Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6482264Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6482563Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6482967Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6483359Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:40:48.6483592Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6483824Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:40:48.6483924Z ok (6.222s) 2023-01-11T22:40:48.6483943Z 2023-01-11T22:40:48.6484203Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6484296Z Ran 1 test in 6.222s 2023-01-11T22:40:48.6484317Z 2023-01-11T22:40:48.6484410Z OK 2023-01-11T22:40:48.6484429Z 2023-01-11T22:40:48.6484549Z Generating XML reports... 2023-01-11T22:40:48.6485004Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223403.xml 2023-01-11T22:40:48.6485417Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6485596Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6485974Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6486164Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6486401Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgu9euqjc 2023-01-11T22:40:48.6486668Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgu9euqjc/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6486689Z 2023-01-11T22:40:48.6486791Z Running tests... 2023-01-11T22:40:48.6487051Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6487362Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6487687Z test_all_to_all_single (__main__.GlooProcessGroupWithDispatchedCollectivesTests) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6487904Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78696 2023-01-11T22:40:48.6488267Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6488440Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6488798Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6488988Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6489241Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjni8tn0_ 2023-01-11T22:40:48.6489507Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjni8tn0_/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6489735Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6489976Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6490371Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:40:48.6490470Z ok (3.869s) 2023-01-11T22:40:48.6490489Z 2023-01-11T22:40:48.6490732Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6490842Z Ran 1 test in 3.870s 2023-01-11T22:40:48.6490861Z 2023-01-11T22:40:48.6490953Z OK 2023-01-11T22:40:48.6490972Z 2023-01-11T22:40:48.6491092Z Generating XML reports... 2023-01-11T22:40:48.6491634Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-GlooProcessGroupWithDispatchedCollectivesTests-20230111223412.xml 2023-01-11T22:40:48.6492057Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6492236Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6492612Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6492803Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6493038Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6ip37odm 2023-01-11T22:40:48.6493304Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6ip37odm/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6493324Z 2023-01-11T22:40:48.6493431Z Running tests... 2023-01-11T22:40:48.6493692Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6493999Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6494335Z test_allgather_coalesced (__main__.GlooProcessGroupWithDispatchedCollectivesTests) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6494593Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78768 2023-01-11T22:40:48.6494965Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6495139Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6495498Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6495687Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6495935Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9l67fhom 2023-01-11T22:40:48.6496201Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9l67fhom/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6496430Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6496865Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6497273Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:40:48.6498020Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2588: UserWarning: torch.distributed.all_gather_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:40:48.6498132Z warnings.warn( 2023-01-11T22:40:48.6498215Z ok (3.817s) 2023-01-11T22:40:48.6498235Z 2023-01-11T22:40:48.6498497Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6498607Z Ran 1 test in 3.817s 2023-01-11T22:40:48.6498629Z 2023-01-11T22:40:48.6498723Z OK 2023-01-11T22:40:48.6498742Z 2023-01-11T22:40:48.6498861Z Generating XML reports... 2023-01-11T22:40:48.6499410Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-GlooProcessGroupWithDispatchedCollectivesTests-20230111223418.xml 2023-01-11T22:40:48.6499774Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6499951Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6500310Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6500498Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6500747Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpppzw0xmf 2023-01-11T22:40:48.6501015Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpppzw0xmf/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6501116Z 2023-01-11T22:40:48.6501232Z Running tests... 2023-01-11T22:40:48.6501499Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6501814Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6502144Z test_allreduce_coalesced (__main__.GlooProcessGroupWithDispatchedCollectivesTests) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6502358Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78840 2023-01-11T22:40:48.6502706Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6502886Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6503263Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6503456Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6503707Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp25n4c6r6 2023-01-11T22:40:48.6504037Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp25n4c6r6/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6504276Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6504518Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6504918Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:40:48.6505634Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1714: UserWarning: torch.distributed.all_reduce_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:40:48.6505750Z warnings.warn( 2023-01-11T22:40:48.6505849Z ok (3.843s) 2023-01-11T22:40:48.6505869Z 2023-01-11T22:40:48.6506131Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6506244Z Ran 1 test in 3.843s 2023-01-11T22:40:48.6506264Z 2023-01-11T22:40:48.6506357Z OK 2023-01-11T22:40:48.6506376Z 2023-01-11T22:40:48.6506497Z Generating XML reports... 2023-01-11T22:40:48.6507046Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-GlooProcessGroupWithDispatchedCollectivesTests-20230111223424.xml 2023-01-11T22:40:48.6507411Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6507569Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6507940Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6508132Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6508383Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd5dr4oz8 2023-01-11T22:40:48.6508651Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd5dr4oz8/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6508671Z 2023-01-11T22:40:48.6508777Z Running tests... 2023-01-11T22:40:48.6509035Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6509341Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6509643Z test_collectives (__main__.GlooProcessGroupWithDispatchedCollectivesTests) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6509859Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78912 2023-01-11T22:40:48.6510224Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6510463Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6510845Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6511035Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6511285Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4k1odoxd 2023-01-11T22:40:48.6511548Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4k1odoxd/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6511774Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6511998Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6512396Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:40:48.6512498Z ok (3.743s) 2023-01-11T22:40:48.6512518Z 2023-01-11T22:40:48.6512777Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6512887Z Ran 1 test in 3.744s 2023-01-11T22:40:48.6512907Z 2023-01-11T22:40:48.6513042Z OK 2023-01-11T22:40:48.6513063Z 2023-01-11T22:40:48.6513187Z Generating XML reports... 2023-01-11T22:40:48.6513735Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-GlooProcessGroupWithDispatchedCollectivesTests-20230111223430.xml 2023-01-11T22:40:48.6514082Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6514254Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6514627Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6514816Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6515072Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp13hnano2 2023-01-11T22:40:48.6515335Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp13hnano2/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6515357Z 2023-01-11T22:40:48.6515464Z Running tests... 2023-01-11T22:40:48.6515724Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6516027Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6516335Z test_monitored_barrier (__main__.GlooProcessGroupWithDispatchedCollectivesTests) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6516553Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78984 2023-01-11T22:40:48.6516918Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6517092Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6517470Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6517664Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6517914Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqwk1b22g 2023-01-11T22:40:48.6518179Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqwk1b22g/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6518389Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6518630Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6519025Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:40:48.6519126Z ok (3.823s) 2023-01-11T22:40:48.6519145Z 2023-01-11T22:40:48.6519400Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6519561Z Ran 1 test in 3.823s 2023-01-11T22:40:48.6519581Z 2023-01-11T22:40:48.6519672Z OK 2023-01-11T22:40:48.6519691Z 2023-01-11T22:40:48.6519815Z Generating XML reports... 2023-01-11T22:40:48.6520359Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-GlooProcessGroupWithDispatchedCollectivesTests-20230111223436.xml 2023-01-11T22:40:48.6520707Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6520881Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6521253Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6521440Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6521692Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmv2fkay2 2023-01-11T22:40:48.6521961Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmv2fkay2/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6521980Z 2023-01-11T22:40:48.6522087Z Running tests... 2023-01-11T22:40:48.6522391Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6522711Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6522940Z test_allgather_basics (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6523160Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 79056 2023-01-11T22:40:48.6523374Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 79057 2023-01-11T22:40:48.6523583Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 79058 2023-01-11T22:40:48.6523791Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 79059 2023-01-11T22:40:48.6524163Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6524338Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6524715Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6524887Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6525244Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6525412Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6525780Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6525965Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6526320Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6526496Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6526861Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6527047Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6527393Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6527562Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6527924Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6528106Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6528360Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprx9r0k_q 2023-01-11T22:40:48.6528684Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprx9r0k_q/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6528941Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqrbxrpkb 2023-01-11T22:40:48.6529207Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqrbxrpkb/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6529417Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6529641Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6529888Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7pwcyno9 2023-01-11T22:40:48.6530148Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7pwcyno9/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6530397Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0damsnj8 2023-01-11T22:40:48.6530662Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0damsnj8/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6530885Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6531152Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6531242Z ok (4.048s) 2023-01-11T22:40:48.6531281Z 2023-01-11T22:40:48.6531530Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6531641Z Ran 1 test in 4.048s 2023-01-11T22:40:48.6531660Z 2023-01-11T22:40:48.6531752Z OK 2023-01-11T22:40:48.6531772Z 2023-01-11T22:40:48.6531893Z Generating XML reports... 2023-01-11T22:40:48.6532320Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223442.xml 2023-01-11T22:40:48.6532683Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6532860Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6533235Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6533410Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6533660Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpin9q9hkc 2023-01-11T22:40:48.6533922Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpin9q9hkc/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6533942Z 2023-01-11T22:40:48.6534049Z Running tests... 2023-01-11T22:40:48.6534313Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6534617Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6534866Z test_allgather_basics_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6535084Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 79239 2023-01-11T22:40:48.6535282Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 79240 2023-01-11T22:40:48.6535497Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 79241 2023-01-11T22:40:48.6535705Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 79242 2023-01-11T22:40:48.6536072Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6536243Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6536880Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6537076Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6537443Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6537717Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6538077Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6538262Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6538625Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6538794Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6539158Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6539338Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6539692Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6539866Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6540212Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6540455Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6540713Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgxne20r9 2023-01-11T22:40:48.6540979Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgxne20r9/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6541201Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6541449Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfjrc3w43 2023-01-11T22:40:48.6541709Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfjrc3w43/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6541935Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6542185Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsz38jgdw 2023-01-11T22:40:48.6542434Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsz38jgdw/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6542683Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp58rpk2gh 2023-01-11T22:40:48.6542941Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp58rpk2gh/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6543160Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6543383Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6543485Z ok (6.059s) 2023-01-11T22:40:48.6543505Z 2023-01-11T22:40:48.6543769Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6543879Z Ran 1 test in 6.059s 2023-01-11T22:40:48.6543902Z 2023-01-11T22:40:48.6543977Z OK 2023-01-11T22:40:48.6544013Z 2023-01-11T22:40:48.6544118Z Generating XML reports... 2023-01-11T22:40:48.6544549Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223449.xml 2023-01-11T22:40:48.6544916Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6545090Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6545461Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6545648Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6545901Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwv5sa3sp 2023-01-11T22:40:48.6546167Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwv5sa3sp/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6546238Z 2023-01-11T22:40:48.6546333Z Running tests... 2023-01-11T22:40:48.6546595Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6546910Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6547156Z test_allgather_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6547367Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 79426 2023-01-11T22:40:48.6547580Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 79427 2023-01-11T22:40:48.6547793Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 79428 2023-01-11T22:40:48.6548004Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 79429 2023-01-11T22:40:48.6548352Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6548528Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6548897Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6549138Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6549510Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6549682Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6550049Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6550231Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6550586Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6550741Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6551117Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6551303Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6551660Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6551829Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6552196Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6552383Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6552635Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzqapc1g1 2023-01-11T22:40:48.6552884Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzqapc1g1/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6553136Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9jzqovun 2023-01-11T22:40:48.6553402Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9jzqovun/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6553628Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6553847Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6554100Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpx4vj2ky3 2023-01-11T22:40:48.6554361Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpx4vj2ky3/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6554582Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6554828Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8obxnchh 2023-01-11T22:40:48.6555074Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8obxnchh/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6555354Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6555458Z ok (4.096s) 2023-01-11T22:40:48.6555478Z 2023-01-11T22:40:48.6555743Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6555854Z Ran 1 test in 4.096s 2023-01-11T22:40:48.6555874Z 2023-01-11T22:40:48.6555964Z OK 2023-01-11T22:40:48.6555983Z 2023-01-11T22:40:48.6556106Z Generating XML reports... 2023-01-11T22:40:48.6556531Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223457.xml 2023-01-11T22:40:48.6556877Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6557051Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6557421Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6557612Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6557905Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6tgvsu41 2023-01-11T22:40:48.6558177Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6tgvsu41/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6558197Z 2023-01-11T22:40:48.6558305Z Running tests... 2023-01-11T22:40:48.6558568Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6558874Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6559117Z test_allgather_coalesced_async (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6559332Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 79609 2023-01-11T22:40:48.6559545Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 79610 2023-01-11T22:40:48.6559761Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 79611 2023-01-11T22:40:48.6559974Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 79612 2023-01-11T22:40:48.6560341Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6560514Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6560892Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6561063Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6561425Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6561596Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6561966Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6562154Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6562509Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6562681Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6563046Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6563230Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6563573Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6563743Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6564106Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6564373Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6564633Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnmgsqp32 2023-01-11T22:40:48.6564899Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnmgsqp32/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6565150Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpq16ah6pt 2023-01-11T22:40:48.6565414Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpq16ah6pt/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6565624Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6565848Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6566093Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9dfazbua 2023-01-11T22:40:48.6566362Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9dfazbua/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6566634Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6566892Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnjeostqy 2023-01-11T22:40:48.6567156Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnjeostqy/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6567377Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6567619Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6567843Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:40:48.6568080Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6568319Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:40:48.6568725Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:40:48.6569113Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:40:48.6569847Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2588: UserWarning: torch.distributed.all_gather_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:40:48.6569955Z warnings.warn( 2023-01-11T22:40:48.6570682Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2588: UserWarning: torch.distributed.all_gather_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:40:48.6570845Z warnings.warn( 2023-01-11T22:40:48.6571251Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:40:48.6571624Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:40:48.6572348Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2588: UserWarning: torch.distributed.all_gather_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:40:48.6572457Z warnings.warn( 2023-01-11T22:40:48.6573173Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2588: UserWarning: torch.distributed.all_gather_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:40:48.6573343Z warnings.warn( 2023-01-11T22:40:48.6573445Z ok (4.144s) 2023-01-11T22:40:48.6573465Z 2023-01-11T22:40:48.6573730Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6573842Z Ran 1 test in 4.144s 2023-01-11T22:40:48.6573862Z 2023-01-11T22:40:48.6573951Z OK 2023-01-11T22:40:48.6573970Z 2023-01-11T22:40:48.6574093Z Generating XML reports... 2023-01-11T22:40:48.6574501Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223503.xml 2023-01-11T22:40:48.6574864Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6575039Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6575414Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6575608Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6575902Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpab4vq9wf 2023-01-11T22:40:48.6576177Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpab4vq9wf/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6576197Z 2023-01-11T22:40:48.6576304Z Running tests... 2023-01-11T22:40:48.6576785Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6577116Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6577382Z test_allgather_coalesced_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6577599Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 79792 2023-01-11T22:40:48.6577816Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 79793 2023-01-11T22:40:48.6578036Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 79794 2023-01-11T22:40:48.6578248Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 79795 2023-01-11T22:40:48.6578620Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6578780Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6579158Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6579344Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6579703Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6579875Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6580237Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6580412Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6580786Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6580972Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6581329Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6581515Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6581871Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6582043Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6582412Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6582689Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6582951Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg435dwwn 2023-01-11T22:40:48.6583218Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg435dwwn/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6583469Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpckasn14x 2023-01-11T22:40:48.6583716Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpckasn14x/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6583963Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4riruzki 2023-01-11T22:40:48.6584225Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4riruzki/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6584472Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9zmcpds1 2023-01-11T22:40:48.6584736Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9zmcpds1/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6585020Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6585254Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6585471Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6585676Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6586420Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2588: UserWarning: torch.distributed.all_gather_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:40:48.6586532Z warnings.warn( 2023-01-11T22:40:48.6587264Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2588: UserWarning: torch.distributed.all_gather_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:40:48.6587376Z warnings.warn( 2023-01-11T22:40:48.6588094Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2588: UserWarning: torch.distributed.all_gather_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:40:48.6588203Z warnings.warn( 2023-01-11T22:40:48.6588916Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2588: UserWarning: torch.distributed.all_gather_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:40:48.6589029Z warnings.warn( 2023-01-11T22:40:48.6589129Z ok (4.005s) 2023-01-11T22:40:48.6589149Z 2023-01-11T22:40:48.6589412Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6589507Z Ran 1 test in 4.006s 2023-01-11T22:40:48.6589530Z 2023-01-11T22:40:48.6589622Z OK 2023-01-11T22:40:48.6589642Z 2023-01-11T22:40:48.6589765Z Generating XML reports... 2023-01-11T22:40:48.6590186Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223510.xml 2023-01-11T22:40:48.6590550Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6590725Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6591098Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6591284Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6591575Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6cuj9ugl 2023-01-11T22:40:48.6591845Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6cuj9ugl/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6591865Z 2023-01-11T22:40:48.6591973Z Running tests... 2023-01-11T22:40:48.6592238Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6592543Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6592808Z test_allgather_noncontiguous_input (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6593022Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 79975 2023-01-11T22:40:48.6593236Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 79976 2023-01-11T22:40:48.6593434Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 79977 2023-01-11T22:40:48.6593652Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 79978 2023-01-11T22:40:48.6594063Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6594241Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6594615Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6594802Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6595161Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6595325Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6595701Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6595886Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6596245Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6596422Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6596773Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6596953Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6597310Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6597477Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6597849Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6598035Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6598293Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8d028495 2023-01-11T22:40:48.6598553Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8d028495/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6598762Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6599012Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpowkqlgbd 2023-01-11T22:40:48.6599279Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpowkqlgbd/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6599522Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpr1oaofsb 2023-01-11T22:40:48.6599781Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpr1oaofsb/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6599999Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6600316Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpaidyxffq 2023-01-11T22:40:48.6600580Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpaidyxffq/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6600806Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6601013Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6601112Z ok (4.044s) 2023-01-11T22:40:48.6601132Z 2023-01-11T22:40:48.6601398Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6601511Z Ran 1 test in 4.044s 2023-01-11T22:40:48.6601530Z 2023-01-11T22:40:48.6601622Z OK 2023-01-11T22:40:48.6601641Z 2023-01-11T22:40:48.6601764Z Generating XML reports... 2023-01-11T22:40:48.6602191Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223516.xml 2023-01-11T22:40:48.6602555Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6602717Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6603136Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6603326Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6603576Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6r62670d 2023-01-11T22:40:48.6603839Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6r62670d/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6603859Z 2023-01-11T22:40:48.6603967Z Running tests... 2023-01-11T22:40:48.6604230Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6604539Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6604790Z test_allgather_stress (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6604988Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 80158 2023-01-11T22:40:48.6605204Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 80159 2023-01-11T22:40:48.6605417Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 80160 2023-01-11T22:40:48.6605629Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 80161 2023-01-11T22:40:48.6605993Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6606165Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6606537Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6606724Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6607067Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6607240Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6607608Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6607791Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6608145Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6608316Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6608687Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6608871Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6609221Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6609431Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6609805Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6609993Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6610246Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpk5_ovh_p 2023-01-11T22:40:48.6610513Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpk5_ovh_p/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6610763Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdq974d4j 2023-01-11T22:40:48.6611026Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdq974d4j/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6611247Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6611479Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvqmavzpx 2023-01-11T22:40:48.6611746Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6612019Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvqmavzpx/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6612243Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6612490Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdsnrqzn5 2023-01-11T22:40:48.6612756Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdsnrqzn5/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6612978Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6613078Z ok (4.613s) 2023-01-11T22:40:48.6613098Z 2023-01-11T22:40:48.6613362Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6613462Z Ran 1 test in 4.613s 2023-01-11T22:40:48.6613481Z 2023-01-11T22:40:48.6613575Z OK 2023-01-11T22:40:48.6613594Z 2023-01-11T22:40:48.6613720Z Generating XML reports... 2023-01-11T22:40:48.6614144Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223522.xml 2023-01-11T22:40:48.6614509Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6614678Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6615053Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6615237Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6615468Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl2dlm6p2 2023-01-11T22:40:48.6615736Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl2dlm6p2/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6615756Z 2023-01-11T22:40:48.6615864Z Running tests... 2023-01-11T22:40:48.6616129Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6616440Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6617092Z test_allgather_stress_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6617323Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 80365 2023-01-11T22:40:48.6617535Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 80366 2023-01-11T22:40:48.6617747Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 80367 2023-01-11T22:40:48.6617938Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 80368 2023-01-11T22:40:48.6618319Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6618588Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6618973Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6619161Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6619525Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6619697Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6620072Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6620243Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6620596Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6620769Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6621196Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6621391Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6621758Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6621928Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6622292Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6622476Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6622714Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphjyk6mu_ 2023-01-11T22:40:48.6622984Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphjyk6mu_/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6623213Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6623467Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd73n3u32 2023-01-11T22:40:48.6623736Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd73n3u32/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6623984Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg9fjykyp 2023-01-11T22:40:48.6624248Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg9fjykyp/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6624468Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6624672Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6624924Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpv5dfepaq 2023-01-11T22:40:48.6625191Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpv5dfepaq/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6625418Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6625518Z ok (7.401s) 2023-01-11T22:40:48.6625538Z 2023-01-11T22:40:48.6625805Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6625917Z Ran 1 test in 7.401s 2023-01-11T22:40:48.6625936Z 2023-01-11T22:40:48.6626026Z OK 2023-01-11T22:40:48.6626045Z 2023-01-11T22:40:48.6626165Z Generating XML reports... 2023-01-11T22:40:48.6626576Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223529.xml 2023-01-11T22:40:48.6626941Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6627113Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6627550Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6627744Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6627996Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnbqes_pc 2023-01-11T22:40:48.6628261Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnbqes_pc/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6628281Z 2023-01-11T22:40:48.6628389Z Running tests... 2023-01-11T22:40:48.6628632Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6628939Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6629184Z test_allreduce_basics (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6629401Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 80576 2023-01-11T22:40:48.6629618Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 80577 2023-01-11T22:40:48.6629877Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 80578 2023-01-11T22:40:48.6630096Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 80579 2023-01-11T22:40:48.6630462Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6630618Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6630990Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6631179Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6631542Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6631718Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6632086Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6632271Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6632630Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6632802Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6633147Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6633329Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6633683Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6633852Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6634226Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6634412Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6634663Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy_o64xi6 2023-01-11T22:40:48.6634913Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1ehusfy7 2023-01-11T22:40:48.6635173Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy_o64xi6/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6635416Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1ehusfy7/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6635668Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu14rztxq 2023-01-11T22:40:48.6635932Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu14rztxq/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6636214Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6636442Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6636667Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6636916Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptmo9lpse 2023-01-11T22:40:48.6637176Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptmo9lpse/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6637379Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6637479Z ok (4.027s) 2023-01-11T22:40:48.6637499Z 2023-01-11T22:40:48.6637763Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6637874Z Ran 1 test in 4.028s 2023-01-11T22:40:48.6637893Z 2023-01-11T22:40:48.6637990Z OK 2023-01-11T22:40:48.6638010Z 2023-01-11T22:40:48.6638133Z Generating XML reports... 2023-01-11T22:40:48.6638608Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223539.xml 2023-01-11T22:40:48.6638983Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6639159Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6639515Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6639706Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6639955Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpufii0526 2023-01-11T22:40:48.6640219Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpufii0526/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6640243Z 2023-01-11T22:40:48.6640353Z Running tests... 2023-01-11T22:40:48.6640611Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6640920Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6641170Z test_allreduce_basics_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6641368Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 80759 2023-01-11T22:40:48.6641581Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 80760 2023-01-11T22:40:48.6641790Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 80761 2023-01-11T22:40:48.6642002Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 80762 2023-01-11T22:40:48.6642373Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6642544Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6642921Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6643112Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6643455Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6643627Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6643993Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6644181Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6644537Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6644709Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6645125Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6645296Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6645673Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6645842Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6646212Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6646396Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6646648Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7pxvxr3m 2023-01-11T22:40:48.6646920Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7pxvxr3m/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6647149Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6647400Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzidedyq8 2023-01-11T22:40:48.6647711Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzidedyq8/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6647925Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6648170Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5czgssnh 2023-01-11T22:40:48.6648433Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5czgssnh/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6648680Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_fhy34lf 2023-01-11T22:40:48.6648939Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_fhy34lf/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6649161Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6649390Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6649492Z ok (5.923s) 2023-01-11T22:40:48.6649515Z 2023-01-11T22:40:48.6649782Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6649876Z Ran 1 test in 5.924s 2023-01-11T22:40:48.6649896Z 2023-01-11T22:40:48.6649988Z OK 2023-01-11T22:40:48.6650008Z 2023-01-11T22:40:48.6650131Z Generating XML reports... 2023-01-11T22:40:48.6650558Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223545.xml 2023-01-11T22:40:48.6650923Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6651095Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6651469Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6651661Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6651899Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxz85r3_3 2023-01-11T22:40:48.6652162Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxz85r3_3/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6652182Z 2023-01-11T22:40:48.6652290Z Running tests... 2023-01-11T22:40:48.6652553Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6652863Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6653137Z test_allreduce_basics_cuda_using_work_api (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6653354Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 80946 2023-01-11T22:40:48.6653568Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 80947 2023-01-11T22:40:48.6653833Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 80948 2023-01-11T22:40:48.6654029Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 80949 2023-01-11T22:40:48.6654397Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6654571Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6654946Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6655135Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6655494Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6655665Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6656037Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6656208Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6656872Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6657070Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6657446Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6657635Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6658000Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6658173Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6658536Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6658725Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6658964Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnzy3f9yt 2023-01-11T22:40:48.6659229Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnzy3f9yt/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6659481Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpo73kp6ja 2023-01-11T22:40:48.6659746Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpo73kp6ja/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6659994Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwqzr533u 2023-01-11T22:40:48.6660255Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwqzr533u/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6660481Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6660706Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6660932Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6661163Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8tl0b2aq 2023-01-11T22:40:48.6661422Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8tl0b2aq/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6661639Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6661739Z ok (5.935s) 2023-01-11T22:40:48.6661759Z 2023-01-11T22:40:48.6662029Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6662143Z Ran 1 test in 5.935s 2023-01-11T22:40:48.6662162Z 2023-01-11T22:40:48.6662257Z OK 2023-01-11T22:40:48.6662276Z 2023-01-11T22:40:48.6662398Z Generating XML reports... 2023-01-11T22:40:48.6662903Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223554.xml 2023-01-11T22:40:48.6663271Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6663446Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6663823Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6664010Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6664259Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpuyyfkuja 2023-01-11T22:40:48.6664525Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpuyyfkuja/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6664544Z 2023-01-11T22:40:48.6664654Z Running tests... 2023-01-11T22:40:48.6664897Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6665210Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6665525Z test_allreduce_basics_using_work_api (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6665750Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 81133 2023-01-11T22:40:48.6665965Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 81134 2023-01-11T22:40:48.6666179Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 81135 2023-01-11T22:40:48.6666389Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 81136 2023-01-11T22:40:48.6666759Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6666935Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6667292Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6667487Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6667850Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6668077Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6668457Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6668644Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6669008Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6669180Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6669525Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6669716Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6670077Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6670252Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6670616Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6670834Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6671098Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpe1yacwmd 2023-01-11T22:40:48.6671367Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpe1yacwmd/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6671594Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6671889Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpldtx4bt1 2023-01-11T22:40:48.6672154Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpldtx4bt1/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6672386Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6672637Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyh6qu223 2023-01-11T22:40:48.6672899Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyh6qu223/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6673150Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpccgd2qbx 2023-01-11T22:40:48.6673409Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpccgd2qbx/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6673633Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6673857Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6673946Z ok (4.151s) 2023-01-11T22:40:48.6673967Z 2023-01-11T22:40:48.6674238Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6674397Z Ran 1 test in 4.151s 2023-01-11T22:40:48.6674418Z 2023-01-11T22:40:48.6674513Z OK 2023-01-11T22:40:48.6674532Z 2023-01-11T22:40:48.6674657Z Generating XML reports... 2023-01-11T22:40:48.6675085Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223602.xml 2023-01-11T22:40:48.6675449Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6675625Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6675978Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6676167Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6676426Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpju1bevu_ 2023-01-11T22:40:48.6676695Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpju1bevu_/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6676714Z 2023-01-11T22:40:48.6676820Z Running tests... 2023-01-11T22:40:48.6677083Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6677390Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6677631Z test_allreduce_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6677827Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 81316 2023-01-11T22:40:48.6678041Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 81317 2023-01-11T22:40:48.6678255Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 81318 2023-01-11T22:40:48.6678472Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 81319 2023-01-11T22:40:48.6678844Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6679018Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6679393Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6679585Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6679945Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6680101Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6680467Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6680706Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6681066Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6681241Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6681616Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6681802Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6682156Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6682308Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6682672Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6682859Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6683116Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpx_8olaeu 2023-01-11T22:40:48.6683428Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpx_8olaeu/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6683686Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplj1dg5l2 2023-01-11T22:40:48.6683952Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplj1dg5l2/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6684201Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgbnrutmr 2023-01-11T22:40:48.6684466Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgbnrutmr/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6684696Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvl0mlmtn 2023-01-11T22:40:48.6684960Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvl0mlmtn/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6685192Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6685422Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6685642Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6685865Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6685964Z ok (4.141s) 2023-01-11T22:40:48.6685984Z 2023-01-11T22:40:48.6686315Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6686415Z Ran 1 test in 4.142s 2023-01-11T22:40:48.6686489Z 2023-01-11T22:40:48.6686565Z OK 2023-01-11T22:40:48.6686583Z 2023-01-11T22:40:48.6686783Z Generating XML reports... 2023-01-11T22:40:48.6687254Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223608.xml 2023-01-11T22:40:48.6687677Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6687888Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6688305Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6688509Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6688906Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi1xi2i6d 2023-01-11T22:40:48.6689212Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi1xi2i6d/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6689232Z 2023-01-11T22:40:48.6689324Z Running tests... 2023-01-11T22:40:48.6689625Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6689968Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6690341Z test_allreduce_coalesced_async (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6690598Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 81499 2023-01-11T22:40:48.6690849Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 81500 2023-01-11T22:40:48.6691097Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 81501 2023-01-11T22:40:48.6691386Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 81502 2023-01-11T22:40:48.6691835Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6691995Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6692417Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6692641Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6693043Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6693296Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6693710Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6693935Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6694367Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6694523Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6694938Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6695157Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6695560Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6695764Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6696171Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6696390Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6696891Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpr8f3qahp 2023-01-11T22:40:48.6697291Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpr8f3qahp/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6697508Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6697794Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqbkt26ny 2023-01-11T22:40:48.6698091Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqbkt26ny/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6698377Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0mmahpcx 2023-01-11T22:40:48.6698678Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0mmahpcx/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6698966Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyoyd5wmg 2023-01-11T22:40:48.6699267Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyoyd5wmg/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6699538Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6699745Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6700040Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6700318Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.6700681Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.6700961Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:40:48.6701234Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:40:48.6701696Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:40:48.6702127Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:40:48.6702582Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:40:48.6703397Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1714: UserWarning: torch.distributed.all_reduce_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:40:48.6703503Z warnings.warn( 2023-01-11T22:40:48.6704336Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1714: UserWarning: torch.distributed.all_reduce_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:40:48.6704494Z warnings.warn( 2023-01-11T22:40:48.6705263Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1714: UserWarning: torch.distributed.all_reduce_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:40:48.6705420Z warnings.warn( 2023-01-11T22:40:48.6705850Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:40:48.6706615Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1714: UserWarning: torch.distributed.all_reduce_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:40:48.6706851Z warnings.warn( 2023-01-11T22:40:48.6707028Z ok (4.099s) 2023-01-11T22:40:48.6707050Z 2023-01-11T22:40:48.6707355Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6707452Z Ran 1 test in 4.099s 2023-01-11T22:40:48.6707471Z 2023-01-11T22:40:48.6707608Z OK 2023-01-11T22:40:48.6707630Z 2023-01-11T22:40:48.6707788Z Generating XML reports... 2023-01-11T22:40:48.6708254Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223615.xml 2023-01-11T22:40:48.6708683Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6708900Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6709317Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6709580Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6709819Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd9kidxri 2023-01-11T22:40:48.6729397Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd9kidxri/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6729429Z 2023-01-11T22:40:48.6729580Z Running tests... 2023-01-11T22:40:48.6729887Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6730211Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6730465Z test_allreduce_coalesced_basics (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6730807Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 81682 2023-01-11T22:40:48.6731029Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 81683 2023-01-11T22:40:48.6731243Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 81684 2023-01-11T22:40:48.6731449Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 81685 2023-01-11T22:40:48.6731831Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6732006Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6732383Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6732557Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6732924Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6733094Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6733517Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6733709Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6734076Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6734251Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6734623Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6734811Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6735153Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6735330Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6735701Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6735885Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6736143Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzexhdx7m 2023-01-11T22:40:48.6736414Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzexhdx7m/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6736964Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6737231Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1590cpxv 2023-01-11T22:40:48.6737482Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1590cpxv/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6737735Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz1kdc3f8 2023-01-11T22:40:48.6737999Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz1kdc3f8/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6738223Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6738471Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgb95d793 2023-01-11T22:40:48.6738730Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgb95d793/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6738953Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6739169Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6739273Z ok (4.040s) 2023-01-11T22:40:48.6739294Z 2023-01-11T22:40:48.6739553Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6739777Z Ran 1 test in 4.040s 2023-01-11T22:40:48.6739797Z 2023-01-11T22:40:48.6739890Z OK 2023-01-11T22:40:48.6739909Z 2023-01-11T22:40:48.6740035Z Generating XML reports... 2023-01-11T22:40:48.6740473Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223621.xml 2023-01-11T22:40:48.6740834Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6741009Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6741384Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6741557Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6741804Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd3mf3m8h 2023-01-11T22:40:48.6742068Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd3mf3m8h/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6742088Z 2023-01-11T22:40:48.6742195Z Running tests... 2023-01-11T22:40:48.6742521Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6742845Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6743109Z test_allreduce_coalesced_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6743325Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 81865 2023-01-11T22:40:48.6743537Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 81866 2023-01-11T22:40:48.6743734Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 81867 2023-01-11T22:40:48.6743943Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 81868 2023-01-11T22:40:48.6744310Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6744488Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6744866Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6745053Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6745409Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6745580Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6745929Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6746113Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6746474Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6746649Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6747010Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6747191Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6747547Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6747718Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6748078Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6748245Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6748495Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpttwa5ckg 2023-01-11T22:40:48.6748835Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpttwa5ckg/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6749089Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpczfjqzqz 2023-01-11T22:40:48.6749355Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpczfjqzqz/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6749579Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6749803Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6750051Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6o2ccy5a 2023-01-11T22:40:48.6750295Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6o2ccy5a/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6750518Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6750766Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu_mzw8l3 2023-01-11T22:40:48.6751030Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu_mzw8l3/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6751298Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6751403Z ok (4.108s) 2023-01-11T22:40:48.6751424Z 2023-01-11T22:40:48.6751692Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6751803Z Ran 1 test in 4.108s 2023-01-11T22:40:48.6751823Z 2023-01-11T22:40:48.6751916Z OK 2023-01-11T22:40:48.6751934Z 2023-01-11T22:40:48.6752040Z Generating XML reports... 2023-01-11T22:40:48.6752469Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223628.xml 2023-01-11T22:40:48.6752834Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6753010Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6753383Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6753576Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6753828Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwsmi6ono 2023-01-11T22:40:48.6754093Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwsmi6ono/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6754113Z 2023-01-11T22:40:48.6754221Z Running tests... 2023-01-11T22:40:48.6754465Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6754770Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6755039Z test_allreduce_coalesced_checks_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6755255Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82048 2023-01-11T22:40:48.6755470Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82049 2023-01-11T22:40:48.6755684Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 82050 2023-01-11T22:40:48.6755896Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 82051 2023-01-11T22:40:48.6756259Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6756416Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6756788Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6756972Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6757332Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6757560Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6757936Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6758118Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6758475Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6758631Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6758993Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6759167Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6759541Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6759733Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6760102Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6760331Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6760588Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb9_uva_h 2023-01-11T22:40:48.6760854Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb9_uva_h/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6761088Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpylvf9b9k 2023-01-11T22:40:48.6761352Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpylvf9b9k/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6761604Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzmo57lku 2023-01-11T22:40:48.6761864Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzmo57lku/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6762116Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmms3iq2n 2023-01-11T22:40:48.6762378Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmms3iq2n/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6762603Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6762822Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6763044Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6763251Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6763351Z ok (5.826s) 2023-01-11T22:40:48.6763372Z 2023-01-11T22:40:48.6763637Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6763747Z Ran 1 test in 5.827s 2023-01-11T22:40:48.6763771Z 2023-01-11T22:40:48.6763862Z OK 2023-01-11T22:40:48.6763881Z 2023-01-11T22:40:48.6764003Z Generating XML reports... 2023-01-11T22:40:48.6764434Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223634.xml 2023-01-11T22:40:48.6764801Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6764959Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6765333Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6765524Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6765776Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgf_0z93i 2023-01-11T22:40:48.6766040Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgf_0z93i/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6766111Z 2023-01-11T22:40:48.6766223Z Running tests... 2023-01-11T22:40:48.6766484Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6766794Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6767039Z test_allreduce_coalesced_stress (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6767256Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82235 2023-01-11T22:40:48.6767468Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82236 2023-01-11T22:40:48.6767679Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 82237 2023-01-11T22:40:48.6767891Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 82238 2023-01-11T22:40:48.6768258Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6768434Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6768806Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6769044Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6769394Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6769566Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6769930Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6770115Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6770470Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6770640Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6771061Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6771255Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6771604Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6771775Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6772133Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6772316Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6772568Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphxdidcs8 2023-01-11T22:40:48.6772835Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphxdidcs8/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6773089Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkajwkbxc 2023-01-11T22:40:48.6773356Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkajwkbxc/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6773581Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6773813Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjbibff96 2023-01-11T22:40:48.6774074Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjbibff96/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6774295Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6774518Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6774763Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbwwo0huv 2023-01-11T22:40:48.6775026Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbwwo0huv/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6775308Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6775412Z ok (4.345s) 2023-01-11T22:40:48.6775432Z 2023-01-11T22:40:48.6775685Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6775798Z Ran 1 test in 4.345s 2023-01-11T22:40:48.6775819Z 2023-01-11T22:40:48.6775911Z OK 2023-01-11T22:40:48.6775930Z 2023-01-11T22:40:48.6776053Z Generating XML reports... 2023-01-11T22:40:48.6776480Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223642.xml 2023-01-11T22:40:48.6777061Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6777237Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6777614Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6777806Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6778116Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpq6_d18ti 2023-01-11T22:40:48.6778390Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpq6_d18ti/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6778410Z 2023-01-11T22:40:48.6778521Z Running tests... 2023-01-11T22:40:48.6778783Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6779087Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6779331Z test_allreduce_stress (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6779548Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82442 2023-01-11T22:40:48.6779763Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82443 2023-01-11T22:40:48.6779963Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 82444 2023-01-11T22:40:48.6780179Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 82445 2023-01-11T22:40:48.6780548Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6780724Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6781096Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6781285Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6781642Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6781811Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6782182Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6782352Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6782710Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6782880Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6783239Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6783422Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6783785Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6783955Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6784315Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6784560Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6784815Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpq9z71ed8 2023-01-11T22:40:48.6785080Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpq9z71ed8/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6785335Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpo56p7geg 2023-01-11T22:40:48.6785598Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpo56p7geg/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6785849Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcb3c7eas 2023-01-11T22:40:48.6786110Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcb3c7eas/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6786338Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6786560Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6786828Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6787086Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpc7232dv7 2023-01-11T22:40:48.6787346Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpc7232dv7/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6787565Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6787664Z ok (4.239s) 2023-01-11T22:40:48.6787684Z 2023-01-11T22:40:48.6787952Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6788063Z Ran 1 test in 4.239s 2023-01-11T22:40:48.6788082Z 2023-01-11T22:40:48.6788172Z OK 2023-01-11T22:40:48.6788192Z 2023-01-11T22:40:48.6788296Z Generating XML reports... 2023-01-11T22:40:48.6788725Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223649.xml 2023-01-11T22:40:48.6789095Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6789270Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6789643Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6789830Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6790077Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd_d60t3a 2023-01-11T22:40:48.6790337Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd_d60t3a/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6790357Z 2023-01-11T22:40:48.6790465Z Running tests... 2023-01-11T22:40:48.6790709Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6791016Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6791269Z test_allreduce_stress_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6791484Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82649 2023-01-11T22:40:48.6791698Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82650 2023-01-11T22:40:48.6791909Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 82651 2023-01-11T22:40:48.6792116Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 82652 2023-01-11T22:40:48.6792479Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6792636Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6793008Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6793255Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6793624Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6793796Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6794160Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6794346Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6794703Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6794875Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6795226Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6795414Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6795816Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6795992Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6796355Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6796539Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6796792Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7d73cfls 2023-01-11T22:40:48.6797056Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7d73cfls/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6797291Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp0rbkspo 2023-01-11T22:40:48.6797558Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp0rbkspo/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6797802Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkwgf2w3_ 2023-01-11T22:40:48.6798064Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkwgf2w3_/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6798287Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6798512Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6798759Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplfrsa0fr 2023-01-11T22:40:48.6799025Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplfrsa0fr/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6799245Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6799450Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6799552Z ok (6.296s) 2023-01-11T22:40:48.6799572Z 2023-01-11T22:40:48.6799837Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6799953Z Ran 1 test in 6.296s 2023-01-11T22:40:48.6799973Z 2023-01-11T22:40:48.6800065Z OK 2023-01-11T22:40:48.6800084Z 2023-01-11T22:40:48.6800204Z Generating XML reports... 2023-01-11T22:40:48.6800629Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223656.xml 2023-01-11T22:40:48.6800996Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6801153Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6801529Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6801717Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6802030Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8i4gu8c4 2023-01-11T22:40:48.6802298Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8i4gu8c4/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6802318Z 2023-01-11T22:40:48.6802426Z Running tests... 2023-01-11T22:40:48.6802689Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6802995Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6803244Z test_barrier_implies_wait (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6803443Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82860 2023-01-11T22:40:48.6803656Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82861 2023-01-11T22:40:48.6803868Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 82862 2023-01-11T22:40:48.6804083Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 82863 2023-01-11T22:40:48.6804510Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6804692Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6805067Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6805257Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6805601Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6805772Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6806136Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6806322Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6806677Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6806850Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6807223Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6807407Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6807762Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6807914Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6808279Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6808465Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6808722Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfokwxzlo 2023-01-11T22:40:48.6808995Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfokwxzlo/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6809220Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6809469Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0hpl9316 2023-01-11T22:40:48.6809732Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0hpl9316/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6809962Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzvu2zf0g 2023-01-11T22:40:48.6810224Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzvu2zf0g/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6810446Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6810725Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6810981Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsnncegee 2023-01-11T22:40:48.6811243Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsnncegee/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6811464Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6811565Z ok (4.111s) 2023-01-11T22:40:48.6811585Z 2023-01-11T22:40:48.6811853Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6811948Z Ran 1 test in 4.111s 2023-01-11T22:40:48.6811967Z 2023-01-11T22:40:48.6812061Z OK 2023-01-11T22:40:48.6812080Z 2023-01-11T22:40:48.6812202Z Generating XML reports... 2023-01-11T22:40:48.6812629Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223704.xml 2023-01-11T22:40:48.6812999Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6813172Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6813593Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6813790Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6814023Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpw6a45try 2023-01-11T22:40:48.6814283Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpw6a45try/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6814303Z 2023-01-11T22:40:48.6814410Z Running tests... 2023-01-11T22:40:48.6814673Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6814979Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6815228Z test_broadcast_basics (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6815447Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83043 2023-01-11T22:40:48.6815663Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83044 2023-01-11T22:40:48.6815876Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 83045 2023-01-11T22:40:48.6816068Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 83046 2023-01-11T22:40:48.6816435Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6816862Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6817259Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6817446Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6817808Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6817979Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6818346Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6818517Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6818872Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6819042Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6819404Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6819589Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6820043Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6820211Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6820578Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6820762Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6820999Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbjtkehn9 2023-01-11T22:40:48.6821267Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbjtkehn9/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6821517Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0cnx0xey 2023-01-11T22:40:48.6821783Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0cnx0xey/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6822029Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5dnw5i99 2023-01-11T22:40:48.6822290Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5dnw5i99/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6822577Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6822813Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6823045Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptrfvsmy2 2023-01-11T22:40:48.6823306Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptrfvsmy2/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6823528Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6823748Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6823851Z ok (4.060s) 2023-01-11T22:40:48.6823871Z 2023-01-11T22:40:48.6824143Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6824255Z Ran 1 test in 4.061s 2023-01-11T22:40:48.6824275Z 2023-01-11T22:40:48.6824369Z OK 2023-01-11T22:40:48.6824388Z 2023-01-11T22:40:48.6824513Z Generating XML reports... 2023-01-11T22:40:48.6824923Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223711.xml 2023-01-11T22:40:48.6825287Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6825460Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6825835Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6826024Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6826275Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3h7j0ukq 2023-01-11T22:40:48.6826543Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3h7j0ukq/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6826564Z 2023-01-11T22:40:48.6826672Z Running tests... 2023-01-11T22:40:48.6826918Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6827223Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6827473Z test_broadcast_basics_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6827689Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83226 2023-01-11T22:40:48.6827904Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83227 2023-01-11T22:40:48.6828115Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 83228 2023-01-11T22:40:48.6828325Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 83229 2023-01-11T22:40:48.6828750Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6828908Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6829286Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6829473Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6829836Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6830007Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6830379Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6830566Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6830922Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6831094Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6831527Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6831719Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6832073Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6832247Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6832612Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6832793Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6833046Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt54bfnvh 2023-01-11T22:40:48.6833315Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt54bfnvh/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6833526Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6833776Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpthjj8ro5 2023-01-11T22:40:48.6834043Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpthjj8ro5/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6834290Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyk2inkzs 2023-01-11T22:40:48.6834550Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyk2inkzs/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6834772Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6835018Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1sqfxgg5 2023-01-11T22:40:48.6835280Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1sqfxgg5/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6835507Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6835713Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6835813Z ok (5.984s) 2023-01-11T22:40:48.6835833Z 2023-01-11T22:40:48.6836099Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6836209Z Ran 1 test in 5.984s 2023-01-11T22:40:48.6836229Z 2023-01-11T22:40:48.6836320Z OK 2023-01-11T22:40:48.6836339Z 2023-01-11T22:40:48.6836462Z Generating XML reports... 2023-01-11T22:40:48.6836890Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223717.xml 2023-01-11T22:40:48.6837258Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6837493Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6837854Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6838042Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6838293Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfcr0n9zx 2023-01-11T22:40:48.6838556Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfcr0n9zx/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6838576Z 2023-01-11T22:40:48.6838684Z Running tests... 2023-01-11T22:40:48.6838941Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6839246Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6839489Z test_broadcast_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6839691Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83413 2023-01-11T22:40:48.6839902Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83414 2023-01-11T22:40:48.6840160Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 83415 2023-01-11T22:40:48.6840381Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 83416 2023-01-11T22:40:48.6840751Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6840923Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6841297Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6841486Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6841825Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6842002Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6842371Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6842556Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6842910Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6843081Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6843453Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6843638Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6843988Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6844146Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6844517Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6844699Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6844952Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbgcfy7t6 2023-01-11T22:40:48.6845220Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbgcfy7t6/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6845444Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6845693Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphsoje19i 2023-01-11T22:40:48.6845957Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphsoje19i/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6846167Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6846469Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpswp41at8 2023-01-11T22:40:48.6846735Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpswp41at8/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6846958Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6847207Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmpw3bb2z 2023-01-11T22:40:48.6847469Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmpw3bb2z/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6847691Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6847791Z ok (4.169s) 2023-01-11T22:40:48.6847811Z 2023-01-11T22:40:48.6848080Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6848174Z Ran 1 test in 4.169s 2023-01-11T22:40:48.6848197Z 2023-01-11T22:40:48.6848289Z OK 2023-01-11T22:40:48.6848308Z 2023-01-11T22:40:48.6848430Z Generating XML reports... 2023-01-11T22:40:48.6848915Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223725.xml 2023-01-11T22:40:48.6849291Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6849465Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6849839Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6850028Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6850262Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcni2hz3a 2023-01-11T22:40:48.6850524Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcni2hz3a/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6850548Z 2023-01-11T22:40:48.6850656Z Running tests... 2023-01-11T22:40:48.6850915Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6851226Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6851471Z test_broadcast_stress (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6851685Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83596 2023-01-11T22:40:48.6851899Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83597 2023-01-11T22:40:48.6852112Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 83598 2023-01-11T22:40:48.6852305Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 83599 2023-01-11T22:40:48.6852670Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6852846Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6853217Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6853410Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6853766Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6853934Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6854298Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6854467Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6854821Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6854990Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6855408Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6855596Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6855956Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6856129Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6856491Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6856907Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6857154Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpoaernnyu 2023-01-11T22:40:48.6857424Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpoaernnyu/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6857678Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6qivg583 2023-01-11T22:40:48.6858018Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6qivg583/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6858276Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpakh3ntar 2023-01-11T22:40:48.6858536Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpakh3ntar/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6858759Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6858979Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6859184Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6859430Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp484tpn8x 2023-01-11T22:40:48.6859686Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp484tpn8x/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6859907Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6860011Z ok (4.241s) 2023-01-11T22:40:48.6860031Z 2023-01-11T22:40:48.6860303Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6860415Z Ran 1 test in 4.241s 2023-01-11T22:40:48.6860435Z 2023-01-11T22:40:48.6860527Z OK 2023-01-11T22:40:48.6860547Z 2023-01-11T22:40:48.6860668Z Generating XML reports... 2023-01-11T22:40:48.6861078Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223732.xml 2023-01-11T22:40:48.6861440Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6861613Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6861984Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6862177Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6862428Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjizfi_vy 2023-01-11T22:40:48.6862690Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjizfi_vy/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6862709Z 2023-01-11T22:40:48.6862816Z Running tests... 2023-01-11T22:40:48.6863060Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6863368Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6863615Z test_broadcast_stress_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6863829Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83803 2023-01-11T22:40:48.6864043Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83804 2023-01-11T22:40:48.6864331Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 83805 2023-01-11T22:40:48.6864543Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 83806 2023-01-11T22:40:48.6864911Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6865085Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6865441Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6865626Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6865988Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6866158Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6866530Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6866762Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6867128Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6867298Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6867644Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6867829Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6868182Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6868349Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6868711Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6868897Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6869151Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp20mlyxct 2023-01-11T22:40:48.6869418Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp20mlyxct/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6869642Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6869875Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpuzxc3gci 2023-01-11T22:40:48.6870137Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpuzxc3gci/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6870384Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbav3z4zi 2023-01-11T22:40:48.6870645Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbav3z4zi/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6870913Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6871146Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6871397Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpx6p5jvx2 2023-01-11T22:40:48.6871656Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpx6p5jvx2/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6871863Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6871964Z ok (6.250s) 2023-01-11T22:40:48.6871984Z 2023-01-11T22:40:48.6872255Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6872367Z Ran 1 test in 6.250s 2023-01-11T22:40:48.6872387Z 2023-01-11T22:40:48.6872479Z OK 2023-01-11T22:40:48.6872498Z 2023-01-11T22:40:48.6872620Z Generating XML reports... 2023-01-11T22:40:48.6873111Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223738.xml 2023-01-11T22:40:48.6873478Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6873651Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6874006Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6874194Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6874441Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpoam4akof 2023-01-11T22:40:48.6874703Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpoam4akof/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6874723Z 2023-01-11T22:40:48.6874830Z Running tests... 2023-01-11T22:40:48.6875090Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6875402Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6875681Z test_empty_tensors (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6875884Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 84014 2023-01-11T22:40:48.6876099Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 84015 2023-01-11T22:40:48.6876307Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 84016 2023-01-11T22:40:48.6876514Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 84017 2023-01-11T22:40:48.6876885Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6877057Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6877428Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6877618Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6877980Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6878135Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6878499Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6878685Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6879040Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6879209Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6879569Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6879756Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6880119Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6880271Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6880634Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6880816Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6881068Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3e04vpur 2023-01-11T22:40:48.6881332Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3e04vpur/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6881557Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6881869Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu05azo00 2023-01-11T22:40:48.6882129Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu05azo00/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6882381Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcvdxrj7p 2023-01-11T22:40:48.6882627Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcvdxrj7p/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6882875Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5uwojhtp 2023-01-11T22:40:48.6883137Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5uwojhtp/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6883364Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6883587Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6883808Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6883909Z ok (4.149s) 2023-01-11T22:40:48.6883930Z 2023-01-11T22:40:48.6884195Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6884336Z Ran 1 test in 4.150s 2023-01-11T22:40:48.6884356Z 2023-01-11T22:40:48.6884454Z OK 2023-01-11T22:40:48.6884473Z 2023-01-11T22:40:48.6884596Z Generating XML reports... 2023-01-11T22:40:48.6885025Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223747.xml 2023-01-11T22:40:48.6885389Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6885563Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6885936Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6886126Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6886379Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5eokpb8u 2023-01-11T22:40:48.6886630Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5eokpb8u/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6886649Z 2023-01-11T22:40:48.6886757Z Running tests... 2023-01-11T22:40:48.6887018Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6887325Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6887561Z test_gather_basics (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6887776Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 84197 2023-01-11T22:40:48.6887991Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 84198 2023-01-11T22:40:48.6888200Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 84199 2023-01-11T22:40:48.6888397Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 84200 2023-01-11T22:40:48.6888764Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6888937Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6889312Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6889501Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6889858Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6890029Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6890392Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6890633Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6890980Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6891153Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6891518Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6891705Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6892059Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6892232Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6892598Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6892779Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6893019Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppjyesqfa 2023-01-11T22:40:48.6893337Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppjyesqfa/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6893593Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpolymomxi 2023-01-11T22:40:48.6893859Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpolymomxi/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6894084Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6894331Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpovkob610 2023-01-11T22:40:48.6894590Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpovkob610/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6894813Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6895033Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6895271Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8xhlf3_n 2023-01-11T22:40:48.6895534Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8xhlf3_n/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6895756Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6895857Z ok (4.157s) 2023-01-11T22:40:48.6895877Z 2023-01-11T22:40:48.6896141Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6896252Z Ran 1 test in 4.158s 2023-01-11T22:40:48.6896271Z 2023-01-11T22:40:48.6896364Z OK 2023-01-11T22:40:48.6896383Z 2023-01-11T22:40:48.6896505Z Generating XML reports... 2023-01-11T22:40:48.6897127Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223753.xml 2023-01-11T22:40:48.6897501Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6897674Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6898053Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6898239Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6898492Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpefbcok8o 2023-01-11T22:40:48.6898758Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpefbcok8o/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6898778Z 2023-01-11T22:40:48.6898886Z Running tests... 2023-01-11T22:40:48.6899144Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6899432Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6899766Z test_gather_basics_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6899990Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 84380 2023-01-11T22:40:48.6900207Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 84381 2023-01-11T22:40:48.6900418Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 84382 2023-01-11T22:40:48.6900628Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 84383 2023-01-11T22:40:48.6900997Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6901171Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6901526Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6901711Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6902072Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6902304Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6902685Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6902870Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6903228Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6903398Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6903744Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6903931Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6904293Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6904463Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6904828Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6905011Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6905263Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmsjlu6mr 2023-01-11T22:40:48.6905529Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmsjlu6mr/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6905775Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6lzasq0y 2023-01-11T22:40:48.6906022Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6lzasq0y/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6906266Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2idcteby 2023-01-11T22:40:48.6906534Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2idcteby/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6906763Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6907011Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvps1c3_v 2023-01-11T22:40:48.6907273Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvps1c3_v/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6907495Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6907709Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6907925Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6908009Z ok (5.931s) 2023-01-11T22:40:48.6908029Z 2023-01-11T22:40:48.6908349Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6908463Z Ran 1 test in 5.931s 2023-01-11T22:40:48.6908482Z 2023-01-11T22:40:48.6908576Z OK 2023-01-11T22:40:48.6908595Z 2023-01-11T22:40:48.6908721Z Generating XML reports... 2023-01-11T22:40:48.6909150Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223800.xml 2023-01-11T22:40:48.6909516Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6909688Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6910042Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6910230Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6910475Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpaylq_r0l 2023-01-11T22:40:48.6910741Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpaylq_r0l/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6910760Z 2023-01-11T22:40:48.6910866Z Running tests... 2023-01-11T22:40:48.6911184Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6911499Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6911734Z test_gather_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6911932Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 84567 2023-01-11T22:40:48.6912143Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 84568 2023-01-11T22:40:48.6912354Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 84569 2023-01-11T22:40:48.6912562Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 84570 2023-01-11T22:40:48.6912932Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6913105Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6913476Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6913665Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6914021Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6914174Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6914536Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6914723Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6915077Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6915249Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6915614Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6915797Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6916158Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6916311Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6916673Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6916856Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6917108Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm_opy4ww 2023-01-11T22:40:48.6917430Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm_opy4ww/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6917686Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpark86f5q 2023-01-11T22:40:48.6917949Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpark86f5q/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6918172Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6918416Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpouec8fdl 2023-01-11T22:40:48.6918660Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpouec8fdl/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6918882Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6919130Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpj8fwc68e 2023-01-11T22:40:48.6919393Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpj8fwc68e/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6919613Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6919888Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6919994Z ok (4.141s) 2023-01-11T22:40:48.6920014Z 2023-01-11T22:40:48.6920280Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6920374Z Ran 1 test in 4.142s 2023-01-11T22:40:48.6920410Z 2023-01-11T22:40:48.6920484Z OK 2023-01-11T22:40:48.6920503Z 2023-01-11T22:40:48.6920626Z Generating XML reports... 2023-01-11T22:40:48.6921051Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223808.xml 2023-01-11T22:40:48.6921413Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6921589Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6921966Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6922157Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6922405Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3chdjtb5 2023-01-11T22:40:48.6922656Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3chdjtb5/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6922676Z 2023-01-11T22:40:48.6922784Z Running tests... 2023-01-11T22:40:48.6923044Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6923347Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6923609Z test_gather_noncontiguous_input (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6923827Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 84750 2023-01-11T22:40:48.6924043Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 84751 2023-01-11T22:40:48.6924255Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 84752 2023-01-11T22:40:48.6924447Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 84753 2023-01-11T22:40:48.6924813Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6924987Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6925360Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6925551Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6925910Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6926137Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6926508Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6926694Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6927043Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6927215Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6927579Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6927764Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6928121Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6928297Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6928705Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6928895Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6929132Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpw22jmy0g 2023-01-11T22:40:48.6929398Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpw22jmy0g/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6929647Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqbqafjae 2023-01-11T22:40:48.6929913Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqbqafjae/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6930140Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6930363Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6930615Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1vwt3rbm 2023-01-11T22:40:48.6930879Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1vwt3rbm/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6931101Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6931331Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2gnfh_g6 2023-01-11T22:40:48.6931593Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2gnfh_g6/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6931814Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6931915Z ok (4.108s) 2023-01-11T22:40:48.6931935Z 2023-01-11T22:40:48.6932202Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6932313Z Ran 1 test in 4.108s 2023-01-11T22:40:48.6932335Z 2023-01-11T22:40:48.6932426Z OK 2023-01-11T22:40:48.6932445Z 2023-01-11T22:40:48.6932568Z Generating XML reports... 2023-01-11T22:40:48.6932981Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223815.xml 2023-01-11T22:40:48.6933348Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6933520Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6933895Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6934083Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6934334Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvsr51wau 2023-01-11T22:40:48.6934599Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvsr51wau/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6934670Z 2023-01-11T22:40:48.6934784Z Running tests... 2023-01-11T22:40:48.6935043Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6935338Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6935574Z test_gather_stress (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6935787Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 84933 2023-01-11T22:40:48.6936003Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 84934 2023-01-11T22:40:48.6936213Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 84935 2023-01-11T22:40:48.6936423Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 84936 2023-01-11T22:40:48.6937055Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6937234Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6937602Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6937865Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6938240Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6938413Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6938781Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6938966Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6939319Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6939489Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6939863Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6940036Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6940393Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6940561Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6940927Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6941113Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6941365Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5uz222e4 2023-01-11T22:40:48.6941629Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5uz222e4/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6941878Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnis__bgs 2023-01-11T22:40:48.6942124Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnis__bgs/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6942350Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6942597Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmps8dlbjde 2023-01-11T22:40:48.6942856Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmps8dlbjde/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6943077Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6943327Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt601hgbo 2023-01-11T22:40:48.6943586Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt601hgbo/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6943888Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6944110Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6944200Z ok (4.749s) 2023-01-11T22:40:48.6944220Z 2023-01-11T22:40:48.6944489Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6944600Z Ran 1 test in 4.749s 2023-01-11T22:40:48.6944620Z 2023-01-11T22:40:48.6944715Z OK 2023-01-11T22:40:48.6944733Z 2023-01-11T22:40:48.6944855Z Generating XML reports... 2023-01-11T22:40:48.6945282Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223821.xml 2023-01-11T22:40:48.6945647Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6945821Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6946176Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6946367Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6946682Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwy02vkmn 2023-01-11T22:40:48.6946955Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwy02vkmn/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6946975Z 2023-01-11T22:40:48.6947083Z Running tests... 2023-01-11T22:40:48.6947346Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6947651Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6947902Z test_gather_stress_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6948119Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 85140 2023-01-11T22:40:48.6948316Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 85141 2023-01-11T22:40:48.6948532Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 85142 2023-01-11T22:40:48.6948743Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 85143 2023-01-11T22:40:48.6949112Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6949286Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6949661Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6949852Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6950208Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6950362Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6950731Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6950918Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6951275Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6951445Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6951806Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6951990Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6952351Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6952523Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6952868Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6953109Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6953362Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_gzxuxwc 2023-01-11T22:40:48.6953626Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_gzxuxwc/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6953852Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6954101Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpustszi87 2023-01-11T22:40:48.6954363Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpustszi87/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6954587Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6954819Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdq0dhix2 2023-01-11T22:40:48.6955086Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdq0dhix2/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6955379Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpuuf5c7gh 2023-01-11T22:40:48.6955648Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpuuf5c7gh/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6955871Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6956093Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6956196Z ok (7.686s) 2023-01-11T22:40:48.6956216Z 2023-01-11T22:40:48.6956481Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6956592Z Ran 1 test in 7.686s 2023-01-11T22:40:48.6956612Z 2023-01-11T22:40:48.6956686Z OK 2023-01-11T22:40:48.6956705Z 2023-01-11T22:40:48.6956826Z Generating XML reports... 2023-01-11T22:40:48.6957258Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223828.xml 2023-01-11T22:40:48.6957627Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6957800Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6958178Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6958368Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6958619Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7s5jxmhd 2023-01-11T22:40:48.6958868Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7s5jxmhd/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6958904Z 2023-01-11T22:40:48.6958995Z Running tests... 2023-01-11T22:40:48.6959258Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6959568Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6959826Z test_multi_device_constructor (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6960041Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 85351 2023-01-11T22:40:48.6960256Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 85352 2023-01-11T22:40:48.6960469Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 85353 2023-01-11T22:40:48.6960675Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 85354 2023-01-11T22:40:48.6961028Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6961200Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6961575Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6961820Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6962191Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6962364Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6962732Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6962919Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6963260Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6963433Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6963797Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6963985Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6964386Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6964563Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6964927Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6965112Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6965368Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz1pqb0hm 2023-01-11T22:40:48.6965620Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz1pqb0hm/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6965844Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6966100Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8ovo_u5p 2023-01-11T22:40:48.6966359Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8ovo_u5p/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6966585Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6966831Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6_jwjaan 2023-01-11T22:40:48.6967090Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6_jwjaan/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6967312Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6967543Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcw7b0g5j 2023-01-11T22:40:48.6967802Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcw7b0g5j/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6968084Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6968183Z ok (4.203s) 2023-01-11T22:40:48.6968203Z 2023-01-11T22:40:48.6968463Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6968565Z Ran 1 test in 4.204s 2023-01-11T22:40:48.6968584Z 2023-01-11T22:40:48.6968665Z OK 2023-01-11T22:40:48.6968685Z 2023-01-11T22:40:48.6968795Z Generating XML reports... 2023-01-11T22:40:48.6969213Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223838.xml 2023-01-11T22:40:48.6969563Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6969727Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6970088Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6970276Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6970621Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpowns5_zs 2023-01-11T22:40:48.6970887Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpowns5_zs/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6970948Z 2023-01-11T22:40:48.6971060Z Running tests... 2023-01-11T22:40:48.6971326Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6971618Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6971857Z test_reduce_basics (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6972071Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 85538 2023-01-11T22:40:48.6972285Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 85539 2023-01-11T22:40:48.6972495Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 85540 2023-01-11T22:40:48.6972710Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 85541 2023-01-11T22:40:48.6973142Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6973323Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6973686Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6973875Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6974236Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6974409Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6974778Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6974967Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6975327Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6975497Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6975864Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6976032Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6976392Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6976744Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6977133Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6977320Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6977577Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3cvw0bh6 2023-01-11T22:40:48.6977844Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3cvw0bh6/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6978094Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5exiqyd9 2023-01-11T22:40:48.6978357Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5exiqyd9/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6978588Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6fx4558_ 2023-01-11T22:40:48.6978848Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6fx4558_/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6979079Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6979303Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6979616Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6979868Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8zq54xn9 2023-01-11T22:40:48.6980132Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8zq54xn9/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6980355Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6980438Z ok (4.130s) 2023-01-11T22:40:48.6980458Z 2023-01-11T22:40:48.6980728Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6980839Z Ran 1 test in 4.130s 2023-01-11T22:40:48.6980859Z 2023-01-11T22:40:48.6980953Z OK 2023-01-11T22:40:48.6980972Z 2023-01-11T22:40:48.6981095Z Generating XML reports... 2023-01-11T22:40:48.6981522Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223845.xml 2023-01-11T22:40:48.6981891Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6982126Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6982518Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6982690Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6982938Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9tob25d8 2023-01-11T22:40:48.6983202Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9tob25d8/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6983222Z 2023-01-11T22:40:48.6983329Z Running tests... 2023-01-11T22:40:48.6983590Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6983897Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6984149Z test_reduce_basics_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6984370Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 85721 2023-01-11T22:40:48.6984569Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 85722 2023-01-11T22:40:48.6984780Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 85723 2023-01-11T22:40:48.6984986Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 85724 2023-01-11T22:40:48.6985353Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6985528Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6985902Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6986092Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6986454Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6986609Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6986979Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6987166Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6987525Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6987693Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6988058Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6988242Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6988663Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6988833Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6989180Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6989363Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6989612Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp17yk10ia 2023-01-11T22:40:48.6989877Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp17yk10ia/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6990103Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.6990352Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdyd203ea 2023-01-11T22:40:48.6990617Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdyd203ea/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6990864Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0q6ngkye 2023-01-11T22:40:48.6991153Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0q6ngkye/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6991383Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.6991605Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.6991858Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmlezchok 2023-01-11T22:40:48.6992123Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmlezchok/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6992343Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.6992444Z ok (5.944s) 2023-01-11T22:40:48.6992468Z 2023-01-11T22:40:48.6992736Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6992848Z Ran 1 test in 5.944s 2023-01-11T22:40:48.6992867Z 2023-01-11T22:40:48.6992942Z OK 2023-01-11T22:40:48.6992960Z 2023-01-11T22:40:48.6993088Z Generating XML reports... 2023-01-11T22:40:48.6993516Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223851.xml 2023-01-11T22:40:48.6993882Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6994055Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6994429Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6994621Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6994871Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp93wkb6c8 2023-01-11T22:40:48.6995121Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp93wkb6c8/_remote_module_non_scriptable.py 2023-01-11T22:40:48.6995157Z 2023-01-11T22:40:48.6995247Z Running tests... 2023-01-11T22:40:48.6995512Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.6995819Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.6996056Z test_reduce_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.6996272Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 85908 2023-01-11T22:40:48.6996484Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 85909 2023-01-11T22:40:48.6996695Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 85910 2023-01-11T22:40:48.6996905Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 85911 2023-01-11T22:40:48.6997314Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6997489Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6997869Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6998059Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6998423Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6998595Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.6998967Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.6999153Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.6999494Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.6999668Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7000085Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7000274Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7000631Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7000801Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7001173Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7001358Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7001614Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvv95rp_c 2023-01-11T22:40:48.7001869Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvv95rp_c/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7002120Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt_adxckf 2023-01-11T22:40:48.7002366Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp24rjgyhf 2023-01-11T22:40:48.7002630Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt_adxckf/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7002886Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp24rjgyhf/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7003110Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.7003333Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.7003552Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.7003788Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa9mxwu2s 2023-01-11T22:40:48.7004049Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa9mxwu2s/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7004269Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.7004369Z ok (4.100s) 2023-01-11T22:40:48.7004389Z 2023-01-11T22:40:48.7004654Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7004767Z Ran 1 test in 4.100s 2023-01-11T22:40:48.7004787Z 2023-01-11T22:40:48.7004879Z OK 2023-01-11T22:40:48.7004898Z 2023-01-11T22:40:48.7005020Z Generating XML reports... 2023-01-11T22:40:48.7005447Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223859.xml 2023-01-11T22:40:48.7005795Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7006027Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7006406Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7006596Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7006850Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyap1spjl 2023-01-11T22:40:48.7007114Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyap1spjl/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7007133Z 2023-01-11T22:40:48.7007241Z Running tests... 2023-01-11T22:40:48.7007501Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7007790Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.7008026Z test_reduce_stress (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.7008243Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86091 2023-01-11T22:40:48.7008455Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 86092 2023-01-11T22:40:48.7008712Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 86093 2023-01-11T22:40:48.7008930Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 86094 2023-01-11T22:40:48.7009297Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7009470Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7009847Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7010018Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7010375Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7010553Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7010923Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7011106Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7011461Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7011631Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7012004Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7012171Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7012526Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7012698Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7013061Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7013253Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7013504Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp172r_hwc 2023-01-11T22:40:48.7013763Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp172r_hwc/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7013989Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.7014243Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf81w0qk3 2023-01-11T22:40:48.7014489Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf81w0qk3/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7014713Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.7015021Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm0cqp2bq 2023-01-11T22:40:48.7015286Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm0cqp2bq/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7015533Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpesbzxzf7 2023-01-11T22:40:48.7015795Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpesbzxzf7/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7016014Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.7016237Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.7016320Z ok (4.474s) 2023-01-11T22:40:48.7016358Z 2023-01-11T22:40:48.7016837Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7016961Z Ran 1 test in 4.474s 2023-01-11T22:40:48.7016989Z 2023-01-11T22:40:48.7017080Z OK 2023-01-11T22:40:48.7017102Z 2023-01-11T22:40:48.7017223Z Generating XML reports... 2023-01-11T22:40:48.7017735Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223906.xml 2023-01-11T22:40:48.7018115Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7018288Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7018662Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7018832Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7019081Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb262ccuf 2023-01-11T22:40:48.7019345Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb262ccuf/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7019370Z 2023-01-11T22:40:48.7019475Z Running tests... 2023-01-11T22:40:48.7019733Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7020041Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.7020286Z test_reduce_stress_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.7020504Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86298 2023-01-11T22:40:48.7020701Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 86299 2023-01-11T22:40:48.7020911Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 86300 2023-01-11T22:40:48.7021117Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 86301 2023-01-11T22:40:48.7021479Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7021654Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7022028Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7022219Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7022578Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7022747Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7023096Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7023281Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7023635Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7023802Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7024243Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7024432Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7024793Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7024961Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7025305Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7025489Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7025738Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpukeb3siz 2023-01-11T22:40:48.7026006Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpukeb3siz/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7026259Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpijk5_wdw 2023-01-11T22:40:48.7026569Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpijk5_wdw/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7026827Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7me41shu 2023-01-11T22:40:48.7027086Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7me41shu/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7027332Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgnkd5juw 2023-01-11T22:40:48.7027575Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgnkd5juw/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7027800Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.7028023Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.7028250Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.7028470Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.7028574Z ok (6.718s) 2023-01-11T22:40:48.7028594Z 2023-01-11T22:40:48.7028865Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7028976Z Ran 1 test in 6.718s 2023-01-11T22:40:48.7028996Z 2023-01-11T22:40:48.7029071Z OK 2023-01-11T22:40:48.7029089Z 2023-01-11T22:40:48.7029212Z Generating XML reports... 2023-01-11T22:40:48.7029637Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223913.xml 2023-01-11T22:40:48.7029999Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7030172Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7030546Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7030737Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7030990Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcrxvaybf 2023-01-11T22:40:48.7031253Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcrxvaybf/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7031273Z 2023-01-11T22:40:48.7031363Z Running tests... 2023-01-11T22:40:48.7031623Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7031930Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.7032161Z test_round_robin (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.7032376Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86509 2023-01-11T22:40:48.7032590Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 86510 2023-01-11T22:40:48.7032858Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 86511 2023-01-11T22:40:48.7033074Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 86512 2023-01-11T22:40:48.7033429Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7033601Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7033973Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7034161Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7034522Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7034692Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7035068Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7035255Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7035673Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7035833Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7036205Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7036390Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7036741Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7036909Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7037278Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7037466Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7037722Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqmbadocw 2023-01-11T22:40:48.7037972Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqmbadocw/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7038223Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpurxgdy0b 2023-01-11T22:40:48.7038487Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpurxgdy0b/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7038738Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxmhughpr 2023-01-11T22:40:48.7039001Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxmhughpr/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7039227Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.7039477Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpuofp7wpc 2023-01-11T22:40:48.7039741Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpuofp7wpc/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7039962Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.7040168Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.7040388Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.7040630Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.7040870Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:40:48.7041109Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.7041396Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:40:48.7041801Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:40:48.7042195Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:40:48.7042587Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:40:48.7042961Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:40:48.7043199Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:40:48.7043436Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:40:48.7043675Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 3 2023-01-11T22:40:48.7043911Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 2 2023-01-11T22:40:48.7044346Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2023-01-11T22:40:48.7044743Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2023-01-11T22:40:48.7045128Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2023-01-11T22:40:48.7045512Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2023-01-11T22:40:48.7045734Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:40:48.7045969Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:40:48.7046207Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 2 2023-01-11T22:40:48.7046442Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 3 2023-01-11T22:40:48.7046830Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:3 with 4 nodes. 2023-01-11T22:40:48.7047392Z [W ProcessGroupRoundRobin.cpp:12] Warning: ProcessGroupRoundRobin is deprecated and scheduled to be removed after this current release (1.13). Please file an issue on https://github.com/pytorch/pytorch/issues if there are any concerns or issues with this deprecation. (function ProcessGroupRoundRobin) 2023-01-11T22:40:48.7047783Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 4 nodes. 2023-01-11T22:40:48.7048171Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 4 nodes. 2023-01-11T22:40:48.7048558Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:3 with 4 nodes. 2023-01-11T22:40:48.7049101Z [W ProcessGroupRoundRobin.cpp:12] Warning: ProcessGroupRoundRobin is deprecated and scheduled to be removed after this current release (1.13). Please file an issue on https://github.com/pytorch/pytorch/issues if there are any concerns or issues with this deprecation. (function ProcessGroupRoundRobin) 2023-01-11T22:40:48.7049629Z [W ProcessGroupRoundRobin.cpp:12] Warning: ProcessGroupRoundRobin is deprecated and scheduled to be removed after this current release (1.13). Please file an issue on https://github.com/pytorch/pytorch/issues if there are any concerns or issues with this deprecation. (function ProcessGroupRoundRobin) 2023-01-11T22:40:48.7050168Z [W ProcessGroupRoundRobin.cpp:12] Warning: ProcessGroupRoundRobin is deprecated and scheduled to be removed after this current release (1.13). Please file an issue on https://github.com/pytorch/pytorch/issues if there are any concerns or issues with this deprecation. (function ProcessGroupRoundRobin) 2023-01-11T22:40:48.7050328Z ok (4.214s) 2023-01-11T22:40:48.7050349Z 2023-01-11T22:40:48.7050598Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7050710Z Ran 1 test in 4.215s 2023-01-11T22:40:48.7050730Z 2023-01-11T22:40:48.7050821Z OK 2023-01-11T22:40:48.7050840Z 2023-01-11T22:40:48.7050964Z Generating XML reports... 2023-01-11T22:40:48.7051387Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223922.xml 2023-01-11T22:40:48.7051753Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7051927Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7052304Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7052477Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7052774Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdai76751 2023-01-11T22:40:48.7053044Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdai76751/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7053064Z 2023-01-11T22:40:48.7053173Z Running tests... 2023-01-11T22:40:48.7053440Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7053747Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.7054004Z test_round_robin_create_destroy (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.7054222Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86724 2023-01-11T22:40:48.7054441Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 86725 2023-01-11T22:40:48.7054637Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 86726 2023-01-11T22:40:48.7054851Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 86727 2023-01-11T22:40:48.7055218Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7055392Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7055772Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7055963Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7056323Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7056495Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7057121Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7057317Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7057681Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7057852Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7058218Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7058401Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7058758Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7058931Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7059390Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7059556Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7059811Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvr_itlsc 2023-01-11T22:40:48.7060078Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvr_itlsc/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7060327Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpeerqr4tm 2023-01-11T22:40:48.7060593Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpeerqr4tm/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7060843Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpuh3h9syt 2023-01-11T22:40:48.7061106Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpuh3h9syt/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7061334Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.7061545Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.7061853Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp67zxywrt 2023-01-11T22:40:48.7062124Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp67zxywrt/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7062351Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.7062573Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.7062815Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.7063054Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:40:48.7063291Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:40:48.7063527Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:40:48.7063916Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:40:48.7064311Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:40:48.7064700Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:40:48.7065089Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:40:48.7065326Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:40:48.7065563Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 2 2023-01-11T22:40:48.7065801Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:40:48.7066036Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 3 2023-01-11T22:40:48.7066423Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2023-01-11T22:40:48.7066792Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2023-01-11T22:40:48.7067177Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2023-01-11T22:40:48.7067563Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2023-01-11T22:40:48.7067799Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:40:48.7068096Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:40:48.7068338Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 2 2023-01-11T22:40:48.7068570Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 3 2023-01-11T22:40:48.7068961Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:3 with 4 nodes. 2023-01-11T22:40:48.7069350Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:3 with 4 nodes. 2023-01-11T22:40:48.7069901Z [W ProcessGroupRoundRobin.cpp:12] Warning: ProcessGroupRoundRobin is deprecated and scheduled to be removed after this current release (1.13). Please file an issue on https://github.com/pytorch/pytorch/issues if there are any concerns or issues with this deprecation. (function ProcessGroupRoundRobin) 2023-01-11T22:40:48.7070490Z [W ProcessGroupRoundRobin.cpp:12] Warning: ProcessGroupRoundRobin is deprecated and scheduled to be removed after this current release (1.13). Please file an issue on https://github.com/pytorch/pytorch/issues if there are any concerns or issues with this deprecation. (function ProcessGroupRoundRobin) 2023-01-11T22:40:48.7070889Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 4 nodes. 2023-01-11T22:40:48.7071308Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 4 nodes. 2023-01-11T22:40:48.7071838Z [W ProcessGroupRoundRobin.cpp:12] Warning: ProcessGroupRoundRobin is deprecated and scheduled to be removed after this current release (1.13). Please file an issue on https://github.com/pytorch/pytorch/issues if there are any concerns or issues with this deprecation. (function ProcessGroupRoundRobin) 2023-01-11T22:40:48.7072377Z [W ProcessGroupRoundRobin.cpp:12] Warning: ProcessGroupRoundRobin is deprecated and scheduled to be removed after this current release (1.13). Please file an issue on https://github.com/pytorch/pytorch/issues if there are any concerns or issues with this deprecation. (function ProcessGroupRoundRobin) 2023-01-11T22:40:48.7072620Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:40:48.7072859Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:40:48.7073095Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 3 2023-01-11T22:40:48.7073328Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 2 2023-01-11T22:40:48.7073724Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:4 with 4 nodes. 2023-01-11T22:40:48.7074113Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 4 nodes. 2023-01-11T22:40:48.7074504Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:4 with 4 nodes. 2023-01-11T22:40:48.7074893Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 4 nodes. 2023-01-11T22:40:48.7075114Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2023-01-11T22:40:48.7075347Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2023-01-11T22:40:48.7075582Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 3 2023-01-11T22:40:48.7075814Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 2 2023-01-11T22:40:48.7076201Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 4 nodes. 2023-01-11T22:40:48.7076651Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:5 with 4 nodes. 2023-01-11T22:40:48.7077039Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:5 with 4 nodes. 2023-01-11T22:40:48.7077425Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 4 nodes. 2023-01-11T22:40:48.7077969Z [W ProcessGroupRoundRobin.cpp:12] Warning: ProcessGroupRoundRobin is deprecated and scheduled to be removed after this current release (1.13). Please file an issue on https://github.com/pytorch/pytorch/issues if there are any concerns or issues with this deprecation. (function ProcessGroupRoundRobin) 2023-01-11T22:40:48.7078504Z [W ProcessGroupRoundRobin.cpp:12] Warning: ProcessGroupRoundRobin is deprecated and scheduled to be removed after this current release (1.13). Please file an issue on https://github.com/pytorch/pytorch/issues if there are any concerns or issues with this deprecation. (function ProcessGroupRoundRobin) 2023-01-11T22:40:48.7079090Z [W ProcessGroupRoundRobin.cpp:12] Warning: ProcessGroupRoundRobin is deprecated and scheduled to be removed after this current release (1.13). Please file an issue on https://github.com/pytorch/pytorch/issues if there are any concerns or issues with this deprecation. (function ProcessGroupRoundRobin) 2023-01-11T22:40:48.7079632Z [W ProcessGroupRoundRobin.cpp:12] Warning: ProcessGroupRoundRobin is deprecated and scheduled to be removed after this current release (1.13). Please file an issue on https://github.com/pytorch/pytorch/issues if there are any concerns or issues with this deprecation. (function ProcessGroupRoundRobin) 2023-01-11T22:40:48.7079717Z ok (4.450s) 2023-01-11T22:40:48.7079755Z 2023-01-11T22:40:48.7080004Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7080116Z Ran 1 test in 4.450s 2023-01-11T22:40:48.7080139Z 2023-01-11T22:40:48.7080230Z OK 2023-01-11T22:40:48.7080250Z 2023-01-11T22:40:48.7080376Z Generating XML reports... 2023-01-11T22:40:48.7080809Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223928.xml 2023-01-11T22:40:48.7081177Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7081351Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7081725Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7081897Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7082149Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0bzafthw 2023-01-11T22:40:48.7082416Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0bzafthw/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7082440Z 2023-01-11T22:40:48.7082550Z Running tests... 2023-01-11T22:40:48.7082820Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7083132Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.7083372Z test_scatter_basics (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.7083591Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86971 2023-01-11T22:40:48.7083790Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 86972 2023-01-11T22:40:48.7084005Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 86973 2023-01-11T22:40:48.7084216Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 86974 2023-01-11T22:40:48.7084586Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7084821Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7085200Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7085393Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7085751Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7085922Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7086270Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7086457Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7086813Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7086982Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7087357Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7087589Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7087954Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7088123Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7088472Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7088662Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7088916Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvtrkbcuo 2023-01-11T22:40:48.7089184Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvtrkbcuo/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7089437Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvtvvbjmq 2023-01-11T22:40:48.7089702Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvtvvbjmq/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7089931Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.7090181Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpetocb2nh 2023-01-11T22:40:48.7090406Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.7090653Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpetocb2nh/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7090876Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.7091122Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpoq9e1tq4 2023-01-11T22:40:48.7091386Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpoq9e1tq4/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7091614Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.7091715Z ok (4.141s) 2023-01-11T22:40:48.7091737Z 2023-01-11T22:40:48.7092005Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7092118Z Ran 1 test in 4.141s 2023-01-11T22:40:48.7092138Z 2023-01-11T22:40:48.7092213Z OK 2023-01-11T22:40:48.7092249Z 2023-01-11T22:40:48.7092354Z Generating XML reports... 2023-01-11T22:40:48.7092781Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223935.xml 2023-01-11T22:40:48.7093146Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7093319Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7093694Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7093950Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7094204Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0qtw_7rz 2023-01-11T22:40:48.7094466Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0qtw_7rz/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7094486Z 2023-01-11T22:40:48.7094577Z Running tests... 2023-01-11T22:40:48.7094843Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7095150Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.7095399Z test_scatter_basics_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.7095615Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87154 2023-01-11T22:40:48.7095827Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87155 2023-01-11T22:40:48.7096042Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 87156 2023-01-11T22:40:48.7096308Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 87157 2023-01-11T22:40:48.7096909Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7097093Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7097477Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7097667Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7098025Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7098195Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7098558Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7098750Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7099111Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7099266Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7099630Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7099815Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7100178Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7100350Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7100713Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7100900Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7101155Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8lmz5yxy 2023-01-11T22:40:48.7101403Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8lmz5yxy/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7101656Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqyhpesi6 2023-01-11T22:40:48.7101921Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqyhpesi6/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7102147Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.7102372Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.7102619Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6u_p8guj 2023-01-11T22:40:48.7102967Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6u_p8guj/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7103220Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwajzml9y 2023-01-11T22:40:48.7103482Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwajzml9y/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7103687Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.7103912Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.7104013Z ok (5.956s) 2023-01-11T22:40:48.7104033Z 2023-01-11T22:40:48.7104302Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7104414Z Ran 1 test in 5.956s 2023-01-11T22:40:48.7104434Z 2023-01-11T22:40:48.7104526Z OK 2023-01-11T22:40:48.7104545Z 2023-01-11T22:40:48.7104668Z Generating XML reports... 2023-01-11T22:40:48.7105095Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223941.xml 2023-01-11T22:40:48.7105532Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7105717Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7106095Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7106286Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7106537Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzjd__sd0 2023-01-11T22:40:48.7106804Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzjd__sd0/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7106824Z 2023-01-11T22:40:48.7106935Z Running tests... 2023-01-11T22:40:48.7107195Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7107509Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.7107738Z test_scatter_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.7107954Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87341 2023-01-11T22:40:48.7108169Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87342 2023-01-11T22:40:48.7108381Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 87343 2023-01-11T22:40:48.7108588Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 87344 2023-01-11T22:40:48.7108957Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7109131Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7109508Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7109682Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7110043Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7110211Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7110579Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7110762Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7111121Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7111292Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7111655Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7111893Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7112241Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7112415Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7112778Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7112961Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7113213Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn2t79hm7 2023-01-11T22:40:48.7113477Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn2t79hm7/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7113725Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8cuml9iu 2023-01-11T22:40:48.7113987Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8cuml9iu/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7114221Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqskv3f0v 2023-01-11T22:40:48.7114529Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqskv3f0v/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7114764Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.7114989Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.7115208Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.7115455Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp904omlog 2023-01-11T22:40:48.7115718Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp904omlog/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7115941Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.7116047Z ok (4.114s) 2023-01-11T22:40:48.7116067Z 2023-01-11T22:40:48.7116319Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7116435Z Ran 1 test in 4.115s 2023-01-11T22:40:48.7116454Z 2023-01-11T22:40:48.7116546Z OK 2023-01-11T22:40:48.7116565Z 2023-01-11T22:40:48.7116690Z Generating XML reports... 2023-01-11T22:40:48.7117117Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223950.xml 2023-01-11T22:40:48.7117483Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7117657Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7118030Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7118203Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7118455Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4t3go7jw 2023-01-11T22:40:48.7118719Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4t3go7jw/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7118740Z 2023-01-11T22:40:48.7118848Z Running tests... 2023-01-11T22:40:48.7119108Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7119415Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.7119654Z test_scatter_stress (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.7119870Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87524 2023-01-11T22:40:48.7120088Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87525 2023-01-11T22:40:48.7120284Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 87526 2023-01-11T22:40:48.7120554Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 87527 2023-01-11T22:40:48.7120926Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7121100Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7121474Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7121663Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7122020Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7122194Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7122542Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7122730Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7123084Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7123298Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7123682Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7123865Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7124219Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7124389Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7124755Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7124926Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7125185Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvt92flzn 2023-01-11T22:40:48.7125453Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvt92flzn/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7125678Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.7125930Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptthto0yn 2023-01-11T22:40:48.7126194Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptthto0yn/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7126418Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.7126667Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfubwvhag 2023-01-11T22:40:48.7126915Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfubwvhag/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7127165Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9gfdey6w 2023-01-11T22:40:48.7127424Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9gfdey6w/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7127647Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.7127869Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.7127972Z ok (4.844s) 2023-01-11T22:40:48.7127992Z 2023-01-11T22:40:48.7128257Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7128369Z Ran 1 test in 4.844s 2023-01-11T22:40:48.7128389Z 2023-01-11T22:40:48.7128480Z OK 2023-01-11T22:40:48.7128500Z 2023-01-11T22:40:48.7128606Z Generating XML reports... 2023-01-11T22:40:48.7129031Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223956.xml 2023-01-11T22:40:48.7129460Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7129635Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7130011Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7130199Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7130447Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_o_luzcr 2023-01-11T22:40:48.7130710Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_o_luzcr/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7130730Z 2023-01-11T22:40:48.7130821Z Running tests... 2023-01-11T22:40:48.7131083Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7131389Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.7131702Z test_scatter_stress_cuda (__main__.ProcessGroupGlooTest) ... skip: Test is flaky, see https://github.com/pytorch/pytorch/issues/15963 (0.001s) 2023-01-11T22:40:48.7131721Z 2023-01-11T22:40:48.7132023Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7132140Z Ran 1 test in 0.001s 2023-01-11T22:40:48.7132159Z 2023-01-11T22:40:48.7132268Z OK (skipped=1) 2023-01-11T22:40:48.7132286Z 2023-01-11T22:40:48.7132408Z Generating XML reports... 2023-01-11T22:40:48.7132840Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224003.xml 2023-01-11T22:40:48.7133186Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7133361Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7133732Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7133927Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7134178Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfcnq_bmo 2023-01-11T22:40:48.7134445Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfcnq_bmo/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7134465Z 2023-01-11T22:40:48.7134572Z Running tests... 2023-01-11T22:40:48.7134829Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7135136Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.7135367Z test_send_recv_all_to_all (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.7135583Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87764 2023-01-11T22:40:48.7135798Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87765 2023-01-11T22:40:48.7136013Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 87766 2023-01-11T22:40:48.7136224Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 87767 2023-01-11T22:40:48.7136781Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7136967Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7137352Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7137524Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7137887Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7138060Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7138432Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7138706Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7139073Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7139243Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7139610Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7139799Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7140140Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7140311Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7140674Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7140860Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7141113Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpj7lgy4l1 2023-01-11T22:40:48.7141437Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpj7lgy4l1/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7141673Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.7141922Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0t8716c4 2023-01-11T22:40:48.7142164Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0t8716c4/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7142413Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyv67wmm0 2023-01-11T22:40:48.7142674Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyv67wmm0/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7142924Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp34vnl3e9 2023-01-11T22:40:48.7143189Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp34vnl3e9/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7143418Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.7143641Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.7143860Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.7143962Z ok (4.158s) 2023-01-11T22:40:48.7143982Z 2023-01-11T22:40:48.7144235Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7144346Z Ran 1 test in 4.159s 2023-01-11T22:40:48.7144365Z 2023-01-11T22:40:48.7144458Z OK 2023-01-11T22:40:48.7144478Z 2023-01-11T22:40:48.7144600Z Generating XML reports... 2023-01-11T22:40:48.7145026Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224005.xml 2023-01-11T22:40:48.7145394Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7145572Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7145946Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7146119Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7146370Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpih251hxq 2023-01-11T22:40:48.7146636Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpih251hxq/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7146656Z 2023-01-11T22:40:48.7146764Z Running tests... 2023-01-11T22:40:48.7147025Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7147332Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.7147664Z test_sparse_allreduce_basics (__main__.ProcessGroupGlooTest) ... skip: intermittent failures on Windows, in CI (0.000s) 2023-01-11T22:40:48.7147688Z 2023-01-11T22:40:48.7147948Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7148060Z Ran 1 test in 0.001s 2023-01-11T22:40:48.7148079Z 2023-01-11T22:40:48.7148169Z OK (skipped=1) 2023-01-11T22:40:48.7148188Z 2023-01-11T22:40:48.7148309Z Generating XML reports... 2023-01-11T22:40:48.7148733Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224012.xml 2023-01-11T22:40:48.7149099Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7149273Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7149647Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7149840Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7150140Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgjl9mj53 2023-01-11T22:40:48.7150393Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgjl9mj53/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7150430Z 2023-01-11T22:40:48.7150521Z Running tests... 2023-01-11T22:40:48.7150786Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7151091Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.7151355Z test_sparse_allreduce_basics_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.7151573Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87980 2023-01-11T22:40:48.7151787Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87981 2023-01-11T22:40:48.7152004Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 87982 2023-01-11T22:40:48.7152216Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 87983 2023-01-11T22:40:48.7152567Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7152740Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7153118Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7153307Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7153670Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7153843Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7154215Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7154401Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7154743Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7154918Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7155285Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7155470Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7155825Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7155995Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7156365Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7156607Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7156868Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkhswhme_ 2023-01-11T22:40:48.7157120Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkhswhme_/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7157371Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpq9rf1xga 2023-01-11T22:40:48.7157637Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpq9rf1xga/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7157887Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl9bz19ox 2023-01-11T22:40:48.7158150Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl9bz19ox/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7158373Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.7158600Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.7158902Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7pe6ovje 2023-01-11T22:40:48.7159171Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7pe6ovje/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7159378Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.7159601Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.7159708Z ok (6.003s) 2023-01-11T22:40:48.7159728Z 2023-01-11T22:40:48.7159997Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7160110Z Ran 1 test in 6.003s 2023-01-11T22:40:48.7160129Z 2023-01-11T22:40:48.7160221Z OK 2023-01-11T22:40:48.7160240Z 2023-01-11T22:40:48.7160363Z Generating XML reports... 2023-01-11T22:40:48.7160795Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224014.xml 2023-01-11T22:40:48.7161150Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7161323Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7161698Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7161890Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7162140Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_3fqdxmq 2023-01-11T22:40:48.7162402Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_3fqdxmq/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7162422Z 2023-01-11T22:40:48.7162532Z Running tests... 2023-01-11T22:40:48.7162788Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7163082Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.7163340Z test_sparse_allreduce_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.7163557Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88347 2023-01-11T22:40:48.7163771Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88348 2023-01-11T22:40:48.7163987Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 88349 2023-01-11T22:40:48.7164196Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 88350 2023-01-11T22:40:48.7164565Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7164738Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7165115Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7165341Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7165706Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7165880Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7166248Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7166431Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7166784Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7166954Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7167322Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7167490Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7167891Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7168066Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7168435Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7168622Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7168876Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjz8hqzd2 2023-01-11T22:40:48.7169142Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjz8hqzd2/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7169389Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgzymkubo 2023-01-11T22:40:48.7169659Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgzymkubo/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7169866Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:40:48.7170118Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp85iy6z8f 2023-01-11T22:40:48.7170377Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp85iy6z8f/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7170602Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:40:48.7170852Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpo8gmttew 2023-01-11T22:40:48.7171168Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpo8gmttew/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7171394Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:40:48.7171618Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:40:48.7171705Z ok (4.036s) 2023-01-11T22:40:48.7171742Z 2023-01-11T22:40:48.7171998Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7172113Z Ran 1 test in 4.036s 2023-01-11T22:40:48.7172133Z 2023-01-11T22:40:48.7172226Z OK 2023-01-11T22:40:48.7172245Z 2023-01-11T22:40:48.7172369Z Generating XML reports... 2023-01-11T22:40:48.7172796Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224022.xml 2023-01-11T22:40:48.7173161Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7173335Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7173705Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7173877Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7174191Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpe_aqg6qw 2023-01-11T22:40:48.7174457Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpe_aqg6qw/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7174477Z 2023-01-11T22:40:48.7174586Z Running tests... 2023-01-11T22:40:48.7174853Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7175160Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.7175468Z test_forward_backward (__main__.ReducerTest) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.7175864Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:40:48.7175966Z ok (0.007s) 2023-01-11T22:40:48.7175986Z 2023-01-11T22:40:48.7176230Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7176342Z Ran 1 test in 0.012s 2023-01-11T22:40:48.7176361Z 2023-01-11T22:40:48.7176454Z OK 2023-01-11T22:40:48.7176473Z 2023-01-11T22:40:48.7176841Z Generating XML reports... 2023-01-11T22:40:48.7177273Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20230111224029.xml 2023-01-11T22:40:48.7177638Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7177812Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7178187Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7178359Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7178615Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpirauwniq 2023-01-11T22:40:48.7178887Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpirauwniq/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7178907Z 2023-01-11T22:40:48.7179015Z Running tests... 2023-01-11T22:40:48.7179277Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7179584Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.7179905Z test_forward_backward_optimizer (__main__.ReducerTest) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.7180301Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:40:48.7181079Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:40:48.7181185Z ok (0.011s) 2023-01-11T22:40:48.7181205Z 2023-01-11T22:40:48.7181465Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7181559Z Ran 1 test in 0.022s 2023-01-11T22:40:48.7181579Z 2023-01-11T22:40:48.7181671Z OK 2023-01-11T22:40:48.7181690Z 2023-01-11T22:40:48.7181813Z Generating XML reports... 2023-01-11T22:40:48.7182204Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20230111224031.xml 2023-01-11T22:40:48.7182570Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7182744Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7183202Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7183396Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7183633Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpofvmz2ri 2023-01-11T22:40:48.7183899Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpofvmz2ri/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7183919Z 2023-01-11T22:40:48.7184024Z Running tests... 2023-01-11T22:40:48.7184284Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7184589Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.7184921Z test_forward_backward_unused_parameters (__main__.ReducerTest) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.7185321Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:40:48.7185421Z ok (0.009s) 2023-01-11T22:40:48.7185440Z 2023-01-11T22:40:48.7185744Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7185844Z Ran 1 test in 0.012s 2023-01-11T22:40:48.7185864Z 2023-01-11T22:40:48.7185957Z OK 2023-01-11T22:40:48.7185976Z 2023-01-11T22:40:48.7186100Z Generating XML reports... 2023-01-11T22:40:48.7186489Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20230111224033.xml 2023-01-11T22:40:48.7186852Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7187028Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7187400Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7187595Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7187849Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph014amwk 2023-01-11T22:40:48.7188097Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph014amwk/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7188117Z 2023-01-11T22:40:48.7188225Z Running tests... 2023-01-11T22:40:48.7188486Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7188792Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.7189108Z test_multi_dtype_multi_bucket (__main__.ReducerTest) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.7189503Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:40:48.7189608Z ok (0.004s) 2023-01-11T22:40:48.7189628Z 2023-01-11T22:40:48.7189885Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7189979Z Ran 1 test in 0.011s 2023-01-11T22:40:48.7190016Z 2023-01-11T22:40:48.7190094Z OK 2023-01-11T22:40:48.7190113Z 2023-01-11T22:40:48.7190236Z Generating XML reports... 2023-01-11T22:40:48.7190626Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20230111224035.xml 2023-01-11T22:40:48.7190989Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7191163Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7191538Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7191726Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7191977Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnhpcnybh 2023-01-11T22:40:48.7192286Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnhpcnybh/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7192311Z 2023-01-11T22:40:48.7192419Z Running tests... 2023-01-11T22:40:48.7192682Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7192989Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.7193307Z test_multi_dtype_single_bucket (__main__.ReducerTest) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.7193703Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:40:48.7193803Z ok (0.007s) 2023-01-11T22:40:48.7193823Z 2023-01-11T22:40:48.7194079Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7194175Z Ran 1 test in 0.011s 2023-01-11T22:40:48.7194211Z 2023-01-11T22:40:48.7194285Z OK 2023-01-11T22:40:48.7194303Z 2023-01-11T22:40:48.7194425Z Generating XML reports... 2023-01-11T22:40:48.7194859Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20230111224037.xml 2023-01-11T22:40:48.7195233Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7195406Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7195781Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7195971Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7196223Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7a7zswvx 2023-01-11T22:40:48.7196473Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7a7zswvx/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7196513Z 2023-01-11T22:40:48.7196605Z Running tests... 2023-01-11T22:40:48.7196864Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7197174Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.7197492Z test_single_dtype_single_bucket (__main__.ReducerTest) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.7197886Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:40:48.7197988Z ok (0.004s) 2023-01-11T22:40:48.7198007Z 2023-01-11T22:40:48.7198262Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7198374Z Ran 1 test in 0.011s 2023-01-11T22:40:48.7198392Z 2023-01-11T22:40:48.7198466Z OK 2023-01-11T22:40:48.7198485Z 2023-01-11T22:40:48.7198611Z Generating XML reports... 2023-01-11T22:40:48.7199000Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20230111224040.xml 2023-01-11T22:40:48.7199371Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7199545Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7199918Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7200108Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7200361Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy6wbcmmt 2023-01-11T22:40:48.7200609Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy6wbcmmt/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7200646Z 2023-01-11T22:40:48.7200737Z Running tests... 2023-01-11T22:40:48.7201057Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7201364Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.7201592Z test_logging_init (__main__.RendezvousEnvTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.7201831Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:40:48.7202227Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:40:48.7202329Z ok (1.636s) 2023-01-11T22:40:48.7202349Z 2023-01-11T22:40:48.7202607Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7202700Z Ran 1 test in 1.636s 2023-01-11T22:40:48.7202719Z 2023-01-11T22:40:48.7202809Z OK 2023-01-11T22:40:48.7202828Z 2023-01-11T22:40:48.7202950Z Generating XML reports... 2023-01-11T22:40:48.7203357Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-RendezvousEnvTest-20230111224042.xml 2023-01-11T22:40:48.7203726Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:40:48.7203946Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:40:48.7204328Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:40:48.7204518Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:40:48.7204755Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzyq0rwej 2023-01-11T22:40:48.7205022Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzyq0rwej/_remote_module_non_scriptable.py 2023-01-11T22:40:48.7205042Z 2023-01-11T22:40:48.7205149Z Running tests... 2023-01-11T22:40:48.7205406Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7205715Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:40:48.7205948Z test_default_store_timeout_gloo (__main__.TimeoutTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:40:48.7206690Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/74714 for allplatform(s) . If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.652s) 2023-01-11T22:40:48.7206711Z 2023-01-11T22:40:48.7206966Z ---------------------------------------------------------------------- 2023-01-11T22:40:48.7207077Z Ran 1 test in 1.652s 2023-01-11T22:40:48.7207096Z 2023-01-11T22:40:48.7207203Z OK (skipped=1) 2023-01-11T22:40:48.7207222Z 2023-01-11T22:40:48.7207328Z Generating XML reports... 2023-01-11T22:40:48.7207717Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-TimeoutTest-20230111224046.xml 2023-01-11T22:40:48.7207740Z 2023-01-11T22:40:48.7208194Z ##[endgroup] 2023-01-11T22:40:48.7208622Z FINISHED PRINTING LOG FILE of distributed/test_c10d_gloo (/var/lib/jenkins/workspace/test/test-reports/distributed-test_c10d_gloo_oniv3k8o) 2023-01-11T22:40:48.7208642Z 2023-01-11T22:40:48.7208899Z Running distributed/fsdp/test_fsdp_core ... [2023-01-11 22:40:48.555458] 2023-01-11T22:40:48.7209362Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_core.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:40:48.555808] 2023-01-11T22:51:00.1604739Z 2023-01-11T22:51:00.1605195Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_core 2023-01-11T22:51:00.1608235Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_core (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_core_dgo9oq20) 2023-01-11T22:51:00.1653589Z 2023-01-11T22:51:00.1655565Z Running tests... 2023-01-11T22:51:00.1656465Z ---------------------------------------------------------------------- 2023-01-11T22:51:00.1657698Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_core 2023-01-11T22:51:00.1658184Z test_pre_backward_hook_registration_after_state_dict (__main__.TestHooks) 2023-01-11T22:51:00.1658775Z Tests that FSDP pre-backward hooks are registered on forward pass ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:51:00.1660235Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88870 2023-01-11T22:51:00.1660698Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88871 2023-01-11T22:51:00.1661344Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.1661803Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.1662378Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.1662832Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.1663549Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.1667832Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.1669118Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.1669856Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.1670849Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.1671788Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.1672980Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.1673704Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.1674234Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.1674703Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.1675949Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.1677302Z warnings.warn( 2023-01-11T22:51:00.1679618Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.1681348Z warnings.warn( 2023-01-11T22:51:00.1681819Z dist init r=0, world=2 2023-01-11T22:51:00.1682262Z dist init r=1, world=2 2023-01-11T22:51:00.1682705Z ok (6.529s) 2023-01-11T22:51:00.1683333Z test_pre_backward_hook_registration_cuda_first_False (__main__.TestHooks) 2023-01-11T22:51:00.1684692Z Tests that FSDP pre-backward hooks are registered on forward pass ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88953 2023-01-11T22:51:00.1685849Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88954 2023-01-11T22:51:00.1687029Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.1688147Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.1689646Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.1690594Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.1691774Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.1692631Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.1693754Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.1694561Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.1695312Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.1696186Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.1698216Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.1698948Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.1699563Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.1700342Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.1702763Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.1704181Z warnings.warn( 2023-01-11T22:51:00.1706340Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.1707719Z warnings.warn( 2023-01-11T22:51:00.1708136Z dist init r=0, world=2 2023-01-11T22:51:00.1708602Z dist init r=1, world=2 2023-01-11T22:51:00.1708982Z ok (4.812s) 2023-01-11T22:51:00.1709587Z test_pre_backward_hook_registration_cuda_first_True (__main__.TestHooks) 2023-01-11T22:51:00.1710821Z Tests that FSDP pre-backward hooks are registered on forward pass ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89036 2023-01-11T22:51:00.1711864Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89037 2023-01-11T22:51:00.1712974Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.1713789Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.1714945Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.1715767Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.1716820Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.1717809Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.1718901Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.1719896Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.1720833Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.1721769Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.1723022Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.1724229Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.1725165Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.1726035Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.1726708Z dist init r=0, world=2 2023-01-11T22:51:00.1727105Z dist init r=1, world=2 2023-01-11T22:51:00.1727573Z ok (4.812s) 2023-01-11T22:51:00.1728355Z test_register_functions_called_cuda_first_False_mixed_precision_False (__main__.TestHooks) 2023-01-11T22:51:00.1729321Z Tests that ``_register_{pre|post}_backward_hooks()`` are called ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89119 2023-01-11T22:51:00.1730285Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89120 2023-01-11T22:51:00.1731379Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.1732226Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.1733327Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.1734206Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.1735358Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.1736207Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.1737925Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.1738861Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.1739795Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.1740746Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.1742006Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.1743276Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.1744245Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.1745072Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.1747572Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.1748978Z warnings.warn( 2023-01-11T22:51:00.1751114Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.1752682Z warnings.warn( 2023-01-11T22:51:00.1753133Z dist init r=0, world=2 2023-01-11T22:51:00.1753634Z dist init r=1, world=2 2023-01-11T22:51:00.1754066Z ok (4.812s) 2023-01-11T22:51:00.1754665Z test_register_functions_called_cuda_first_False_mixed_precision_True (__main__.TestHooks) 2023-01-11T22:51:00.1755669Z Tests that ``_register_{pre|post}_backward_hooks()`` are called ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89198 2023-01-11T22:51:00.1756629Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89199 2023-01-11T22:51:00.1757767Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.1758546Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.1759635Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.1760683Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.1761842Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.1762669Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.1763804Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.1764718Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.1765560Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.1766545Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.1767847Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.1769226Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.1770213Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.1771121Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.1773341Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:69: UserWarning: Both mixed precision and an `auto_wrap_policy` were specified for FSDP, where the wrapped module has batch norm submodules. The batch norm submodules will be wrapped as separate FSDP instances with mixed precision disabled since some batch norm kernels do not support low precision. 2023-01-11T22:51:00.1774767Z warnings.warn( 2023-01-11T22:51:00.1777160Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:69: UserWarning: Both mixed precision and an `auto_wrap_policy` were specified for FSDP, where the wrapped module has batch norm submodules. The batch norm submodules will be wrapped as separate FSDP instances with mixed precision disabled since some batch norm kernels do not support low precision. 2023-01-11T22:51:00.1778528Z warnings.warn( 2023-01-11T22:51:00.1780728Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.1782125Z warnings.warn( 2023-01-11T22:51:00.1784527Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.1785939Z warnings.warn( 2023-01-11T22:51:00.1786385Z dist init r=1, world=2 2023-01-11T22:51:00.1786823Z dist init r=0, world=2 2023-01-11T22:51:00.1787254Z ok (4.812s) 2023-01-11T22:51:00.1787872Z test_register_functions_called_cuda_first_True_mixed_precision_False (__main__.TestHooks) 2023-01-11T22:51:00.1788888Z Tests that ``_register_{pre|post}_backward_hooks()`` are called ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89277 2023-01-11T22:51:00.1789851Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89278 2023-01-11T22:51:00.1790968Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.1791986Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.1793078Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.1793947Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.1795102Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.1796006Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.1797085Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.1798051Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.1798900Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.1799774Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.1801030Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.1802327Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.1803241Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.1804103Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.1804747Z dist init r=1, world=2 2023-01-11T22:51:00.1805229Z dist init r=0, world=2 2023-01-11T22:51:00.1805667Z ok (4.813s) 2023-01-11T22:51:00.1806274Z test_register_functions_called_cuda_first_True_mixed_precision_True (__main__.TestHooks) 2023-01-11T22:51:00.1807297Z Tests that ``_register_{pre|post}_backward_hooks()`` are called ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89356 2023-01-11T22:51:00.1808297Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89357 2023-01-11T22:51:00.1809417Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.1810221Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.1811293Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.1812166Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.1813192Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.1814000Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.1815241Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.1816071Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.1817253Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.1818234Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.1819531Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.1820939Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.1821912Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.1822831Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.1825184Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:69: UserWarning: Both mixed precision and an `auto_wrap_policy` were specified for FSDP, where the wrapped module has batch norm submodules. The batch norm submodules will be wrapped as separate FSDP instances with mixed precision disabled since some batch norm kernels do not support low precision. 2023-01-11T22:51:00.1826546Z warnings.warn( 2023-01-11T22:51:00.1828409Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:69: UserWarning: Both mixed precision and an `auto_wrap_policy` were specified for FSDP, where the wrapped module has batch norm submodules. The batch norm submodules will be wrapped as separate FSDP instances with mixed precision disabled since some batch norm kernels do not support low precision. 2023-01-11T22:51:00.1829702Z warnings.warn( 2023-01-11T22:51:00.1830139Z dist init r=1, world=2 2023-01-11T22:51:00.1830649Z dist init r=0, world=2 2023-01-11T22:51:00.1831016Z ok (4.713s) 2023-01-11T22:51:00.1831635Z test_transformer_no_grad_mixed_precision_False (__main__.TestNoGrad) 2023-01-11T22:51:00.1832825Z Tests that for an FSDP-wrapped transformer model with shared ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89435 2023-01-11T22:51:00.1833829Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89436 2023-01-11T22:51:00.1834916Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.1835750Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.1836885Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.1837831Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.1838800Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.1839438Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.1840196Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.1840692Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.1841424Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.1841926Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.1842668Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.1843504Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.1844616Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.1845530Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.1848118Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.1849600Z warnings.warn( 2023-01-11T22:51:00.1851896Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.1853582Z warnings.warn( 2023-01-11T22:51:00.1854052Z dist init r=0, world=2 2023-01-11T22:51:00.1854511Z dist init r=1, world=2 2023-01-11T22:51:00.1854901Z ok (4.812s) 2023-01-11T22:51:00.1855504Z test_transformer_no_grad_mixed_precision_True (__main__.TestNoGrad) 2023-01-11T22:51:00.1857186Z Tests that for an FSDP-wrapped transformer model with shared ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89518 2023-01-11T22:51:00.1858197Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89519 2023-01-11T22:51:00.1859404Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.1860222Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.1861320Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.1862156Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.1863205Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.1864007Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.1865039Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.1865907Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.1866727Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.1867606Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.1868831Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.1870090Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.1871157Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.1872030Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.1874125Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:69: UserWarning: Both mixed precision and an `auto_wrap_policy` were specified for FSDP, where the wrapped module has batch norm submodules. The batch norm submodules will be wrapped as separate FSDP instances with mixed precision disabled since some batch norm kernels do not support low precision. 2023-01-11T22:51:00.1875411Z warnings.warn( 2023-01-11T22:51:00.1877290Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:69: UserWarning: Both mixed precision and an `auto_wrap_policy` were specified for FSDP, where the wrapped module has batch norm submodules. The batch norm submodules will be wrapped as separate FSDP instances with mixed precision disabled since some batch norm kernels do not support low precision. 2023-01-11T22:51:00.1878691Z warnings.warn( 2023-01-11T22:51:00.1880998Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.1882518Z warnings.warn( 2023-01-11T22:51:00.1884898Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.1886322Z warnings.warn( 2023-01-11T22:51:00.1886741Z dist init r=0, world=2 2023-01-11T22:51:00.1887224Z dist init r=1, world=2 2023-01-11T22:51:00.1887647Z ok (4.812s) 2023-01-11T22:51:00.1888231Z test_param_change_after_init_mixed_precision_False (__main__.TestParamInit) 2023-01-11T22:51:00.1889527Z Tests that changing FSDP model parameter values in-place after FSDP ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89601 2023-01-11T22:51:00.1890518Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89602 2023-01-11T22:51:00.1891630Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.1892466Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.1893564Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.1894400Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.1895468Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.1896272Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.1897774Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.1898645Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.1899524Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.1900416Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.1901597Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.1902892Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.1903805Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.1904763Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.1907244Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.1908850Z warnings.warn( 2023-01-11T22:51:00.1910983Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.1912448Z warnings.warn( 2023-01-11T22:51:00.1912828Z dist init r=1, world=2 2023-01-11T22:51:00.1913276Z dist init r=0, world=2 2023-01-11T22:51:00.1913689Z ok (4.712s) 2023-01-11T22:51:00.1914268Z test_param_change_after_init_mixed_precision_True (__main__.TestParamInit) 2023-01-11T22:51:00.1915560Z Tests that changing FSDP model parameter values in-place after FSDP ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89680 2023-01-11T22:51:00.1916672Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89681 2023-01-11T22:51:00.1917818Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.1918650Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.1919720Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.1920583Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.1921641Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.1922438Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.1923525Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.1924379Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.1925175Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.1926074Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.1927387Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.1928742Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.1929753Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.1930677Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.1932947Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:69: UserWarning: Both mixed precision and an `auto_wrap_policy` were specified for FSDP, where the wrapped module has batch norm submodules. The batch norm submodules will be wrapped as separate FSDP instances with mixed precision disabled since some batch norm kernels do not support low precision. 2023-01-11T22:51:00.1934263Z warnings.warn( 2023-01-11T22:51:00.1936428Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.1938291Z warnings.warn( 2023-01-11T22:51:00.1940416Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:69: UserWarning: Both mixed precision and an `auto_wrap_policy` were specified for FSDP, where the wrapped module has batch norm submodules. The batch norm submodules will be wrapped as separate FSDP instances with mixed precision disabled since some batch norm kernels do not support low precision. 2023-01-11T22:51:00.1941661Z warnings.warn( 2023-01-11T22:51:00.1943869Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.1945255Z warnings.warn( 2023-01-11T22:51:00.1945702Z dist init r=1, world=2 2023-01-11T22:51:00.1946135Z dist init r=0, world=2 2023-01-11T22:51:00.1946574Z ok (4.812s) 2023-01-11T22:51:00.1947131Z test_delayed_optim_step_offload_false_no_shard (__main__.TestParityWithDDP) 2023-01-11T22:51:00.1948221Z Tests the FSDP forward, backward, and optimizer step runtime by ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89759 2023-01-11T22:51:00.1949191Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89760 2023-01-11T22:51:00.1950328Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.1951106Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.1952161Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.1953026Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.1954172Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.1955072Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.1956183Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.1957065Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.1957851Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.1958744Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.1959937Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.1961234Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.1962159Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.1963050Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.1963907Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.1964780Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.1967200Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.1968618Z warnings.warn( 2023-01-11T22:51:00.1970920Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.1972333Z warnings.warn( 2023-01-11T22:51:00.1973000Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.1973833Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.1974768Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.1975634Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.1976931Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.1977781Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.1978779Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.1979677Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.1980496Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.1981326Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.1982197Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.1983055Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.1983896Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.1984792Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.1985679Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.1986566Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.1988412Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.1990742Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.1993073Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.1995570Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.1998003Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2000505Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2002784Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2005060Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2007381Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2009729Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2012102Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2014377Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2017181Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2019635Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2022008Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2024284Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2026555Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2028948Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2031220Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2033440Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2035892Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2038257Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2040680Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2043009Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2045283Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2047613Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2049888Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2052220Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2054454Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2057282Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2059698Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2062200Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2064783Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2067071Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2069388Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2071649Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2073954Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2076285Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2078501Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2080874Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2083326Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2085756Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2088104Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2090415Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2091781Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2092651Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2093574Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2094466Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2095333Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2096264Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2097632Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2098509Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2099376Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2100221Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2101084Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2101967Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2102816Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2103696Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2104572Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2105418Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2106265Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2107134Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2108031Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2108881Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2109725Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2110620Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2111486Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2112332Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2114192Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2116701Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2118942Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2121206Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2123724Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2126302Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2128856Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2131576Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2133749Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2135304Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2138191Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2140429Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2142759Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2145208Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2147817Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2150500Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2153369Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2156096Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2158702Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2160751Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2163178Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2165472Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2167778Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2170073Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2172563Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2175157Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2177827Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2180129Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2182732Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2185241Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2187578Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2189964Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2192369Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2194778Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2197158Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2199522Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2201807Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2204293Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2206655Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2208967Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2211304Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2213597Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2215970Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2218242Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2219456Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2220658Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2221880Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2223086Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2224273Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2225601Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2226810Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2228009Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2229280Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2230498Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2231701Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2232904Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2234108Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2235293Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2236499Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2237695Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2238893Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2240091Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2240871Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2241356Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2241816Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2242286Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2242754Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2243205Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2243569Z dist init r=0, world=2 2023-01-11T22:51:00.2243822Z dist init r=1, world=2 2023-01-11T22:51:00.2244048Z ok (20.135s) 2023-01-11T22:51:00.2244381Z test_delayed_optim_step_offload_false_none (__main__.TestParityWithDDP) 2023-01-11T22:51:00.2244962Z Tests the FSDP forward, backward, and optimizer step runtime by ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89842 2023-01-11T22:51:00.2245499Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89843 2023-01-11T22:51:00.2246093Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.2246541Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.2247116Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.2247565Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.2248151Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.2248589Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.2249162Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.2249609Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.2250059Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.2250552Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.2251205Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.2251867Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.2252391Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.2252859Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.2253322Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2253801Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2255064Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.2255831Z warnings.warn( 2023-01-11T22:51:00.2257353Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.2258129Z warnings.warn( 2023-01-11T22:51:00.2258484Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2258962Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2259437Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2259893Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2260373Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2260842Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2261391Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2261857Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2262325Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2262788Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2263234Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2263696Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2264159Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2264627Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2265077Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2265544Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2266542Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2267841Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2269067Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2270266Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2271473Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2272776Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2273985Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2275192Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2275913Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2276393Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2276902Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2277379Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2277843Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2295039Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2295523Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2295986Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2296450Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2297174Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2297645Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2298104Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2298547Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2299012Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2299478Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2299937Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2300381Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2300840Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2301299Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2301762Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2302205Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2302672Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2303148Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2303610Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2304630Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2305977Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2307191Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2308389Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2309652Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2310868Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2312074Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2313284Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2314477Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2315981Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:237: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:00.2316797Z (rank, world_num_valid_indices[rank]) 2023-01-11T22:51:00.2317197Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2317658Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2318128Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2318600Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2319049Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2319514Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2319870Z dist init r=1, world=2 2023-01-11T22:51:00.2320162Z dist init r=0, world=2 2023-01-11T22:51:00.2320398Z ok (29.450s) 2023-01-11T22:51:00.2320741Z test_delayed_optim_step_offload_false_shard_grad_op (__main__.TestParityWithDDP) 2023-01-11T22:51:00.2321284Z Tests the FSDP forward, backward, and optimizer step runtime by ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89925 2023-01-11T22:51:00.2321793Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89926 2023-01-11T22:51:00.2322399Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.2322843Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.2323392Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.2323856Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.2324430Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.2324875Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.2325486Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.2325961Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.2326407Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.2326898Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.2327530Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.2328207Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.2328725Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.2329171Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.2329645Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2330124Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2331387Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.2332157Z warnings.warn( 2023-01-11T22:51:00.2333295Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.2334059Z warnings.warn( 2023-01-11T22:51:00.2334425Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2334902Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2335357Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2335827Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2336290Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2449955Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2450632Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2451097Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2451570Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2452036Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2452494Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2452939Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2453397Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2453866Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2454314Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2455040Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2456161Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2457771Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2459004Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2460236Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2461438Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2462667Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2463878Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2465094Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2465945Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2466420Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2466879Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2467351Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2467815Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2468277Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2468730Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2469191Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2469654Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2470102Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2470634Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2471107Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2471565Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2472017Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2472468Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2472923Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2473371Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2473834Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2474300Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2474758Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2475199Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2475652Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2476109Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2476564Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2477551Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2478788Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2480019Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2481236Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2482582Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2483788Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2485003Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2486246Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2487474Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2489007Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:237: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:00.2489819Z (rank, world_num_valid_indices[rank]) 2023-01-11T22:51:00.2490205Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2490679Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2491136Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2491596Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2492130Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2492578Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2492929Z dist init r=0, world=2 2023-01-11T22:51:00.2493162Z dist init r=1, world=2 2023-01-11T22:51:00.2493377Z ok (29.449s) 2023-01-11T22:51:00.2493695Z test_delayed_optim_step_offload_true_no_shard (__main__.TestParityWithDDP) 2023-01-11T22:51:00.2494832Z Tests the FSDP forward, backward, and optimizer step runtime by ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/82490 for platform(s) linux, rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.001s) 2023-01-11T22:51:00.2495599Z test_delayed_optim_step_offload_true_none (__main__.TestParityWithDDP) 2023-01-11T22:51:00.2496125Z Tests the FSDP forward, backward, and optimizer step runtime by ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90008 2023-01-11T22:51:00.2496893Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 90009 2023-01-11T22:51:00.2497628Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.2498071Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.2498632Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.2499102Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.2499674Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.2500104Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.2500646Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.2501098Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.2501538Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.2502022Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.2502726Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.2503420Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.2503932Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.2504379Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.2504844Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2505312Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2506585Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.2507359Z warnings.warn( 2023-01-11T22:51:00.2508491Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.2509247Z warnings.warn( 2023-01-11T22:51:00.2509510Z File "", line 1, in 2023-01-11T22:51:00.2509873Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2510224Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2510577Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2510938Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2511300Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2511623Z self.run() 2023-01-11T22:51:00.2511948Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2512284Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2512791Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2513168Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2513687Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2514123Z getattr(self, test_name)() 2023-01-11T22:51:00.2514635Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2514989Z fn() 2023-01-11T22:51:00.2515453Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2515834Z test(self, **param_kwargs) 2023-01-11T22:51:00.2516336Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2516709Z return func(*args, **kwargs) 2023-01-11T22:51:00.2517082Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2517439Z self.run_subtests( 2023-01-11T22:51:00.2517931Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2518333Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2518925Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2519344Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2519893Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2520262Z output = model(*input) 2023-01-11T22:51:00.2520725Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2521096Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2521620Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2522058Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2522615Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2522996Z _lazy_init(state, module) 2023-01-11T22:51:00.2523479Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2523898Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2524473Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2524871Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2525369Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2525731Z return func(*args, **kwargs) 2023-01-11T22:51:00.2526255Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2526618Z p_assert( 2023-01-11T22:51:00.2527084Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2527455Z traceback.print_stack() 2023-01-11T22:51:00.2527715Z File "", line 1, in 2023-01-11T22:51:00.2528069Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2528423Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2528767Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2529126Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2529503Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2529823Z self.run() 2023-01-11T22:51:00.2530134Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2530490Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2531060Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2531424Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2531940Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2532322Z getattr(self, test_name)() 2023-01-11T22:51:00.2532815Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2533164Z fn() 2023-01-11T22:51:00.2533647Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2534027Z test(self, **param_kwargs) 2023-01-11T22:51:00.2534515Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2534892Z return func(*args, **kwargs) 2023-01-11T22:51:00.2535289Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2535634Z self.run_subtests( 2023-01-11T22:51:00.2536177Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2536826Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2537392Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2537786Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2538326Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2538703Z output = model(*input) 2023-01-11T22:51:00.2539157Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2539528Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2540058Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2540497Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2541030Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2541405Z _lazy_init(state, module) 2023-01-11T22:51:00.2541900Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2542308Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2542889Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2543309Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2543817Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2544171Z return func(*args, **kwargs) 2023-01-11T22:51:00.2544707Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2545080Z p_assert( 2023-01-11T22:51:00.2545525Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2545893Z traceback.print_stack() 2023-01-11T22:51:00.2546274Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2546755Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2547119Z File "", line 1, in 2023-01-11T22:51:00.2547485Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2547846Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2548298Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2548655Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2549041Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2549351Z self.run() 2023-01-11T22:51:00.2549673Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2550025Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2550533Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2550903Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2551429Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2551825Z getattr(self, test_name)() 2023-01-11T22:51:00.2552322Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2552679Z fn() 2023-01-11T22:51:00.2553164Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2553615Z test(self, **param_kwargs) 2023-01-11T22:51:00.2554143Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2554531Z return func(*args, **kwargs) 2023-01-11T22:51:00.2554927Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2555275Z self.run_subtests( 2023-01-11T22:51:00.2555535Z File "", line 1, in 2023-01-11T22:51:00.2556037Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2556430Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2556973Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2557386Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2557772Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2558120Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2558659Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2559045Z output = model(*input) 2023-01-11T22:51:00.2559374Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2559730Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2560220Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2560580Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2560952Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2561285Z self.run() 2023-01-11T22:51:00.2561787Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2562215Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2562598Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2562960Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2563478Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2563859Z _lazy_init(state, module) 2023-01-11T22:51:00.2564339Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2564719Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2565214Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2565750Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2566300Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2566730Z getattr(self, test_name)() 2023-01-11T22:51:00.2567291Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2567721Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2568251Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2568591Z fn() 2023-01-11T22:51:00.2569045Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2569410Z return func(*args, **kwargs) 2023-01-11T22:51:00.2569912Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2570302Z test(self, **param_kwargs) 2023-01-11T22:51:00.2570880Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2571250Z p_assert( 2023-01-11T22:51:00.2571744Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2572129Z return func(*args, **kwargs) 2023-01-11T22:51:00.2572611Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2572971Z traceback.print_stack() 2023-01-11T22:51:00.2573369Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2573734Z self.run_subtests( 2023-01-11T22:51:00.2574214Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2574635Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2575180Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2575593Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2576126Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2576511Z output = model(*input) 2023-01-11T22:51:00.2577237Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2577604Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2578153Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2578605Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2579171Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2579542Z _lazy_init(state, module) 2023-01-11T22:51:00.2580043Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2580462Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2581028Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2581453Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2581961Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2582332Z return func(*args, **kwargs) 2023-01-11T22:51:00.2582844Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2583314Z p_assert( 2023-01-11T22:51:00.2583785Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2584152Z traceback.print_stack() 2023-01-11T22:51:00.2584544Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2585022Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2585393Z File "", line 1, in 2023-01-11T22:51:00.2585955Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2586318Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2586690Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2587045Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2587429Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2587759Z self.run() 2023-01-11T22:51:00.2588069Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2588500Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2589034Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2589415Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2589918Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2590301Z getattr(self, test_name)() 2023-01-11T22:51:00.2590798Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2591140Z fn() 2023-01-11T22:51:00.2591621Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2592062Z test(self, **param_kwargs) 2023-01-11T22:51:00.2592555Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2592944Z return func(*args, **kwargs) 2023-01-11T22:51:00.2593337Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2593702Z self.run_subtests( 2023-01-11T22:51:00.2594178Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2594595Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2595135Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2595531Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2596076Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2596468Z output = model(*input) 2023-01-11T22:51:00.2596940Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2597303Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2597841Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2598282Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2598820Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2599210Z _lazy_init(state, module) 2023-01-11T22:51:00.2599709Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2600134Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2600697Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2601210Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2601726Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2602081Z return func(*args, **kwargs) 2023-01-11T22:51:00.2602608Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2602979Z p_assert( 2023-01-11T22:51:00.2603437Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2603791Z traceback.print_stack() 2023-01-11T22:51:00.2604070Z File "", line 1, in 2023-01-11T22:51:00.2604429Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2604780Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2605145Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2605510Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2605925Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2606266Z self.run() 2023-01-11T22:51:00.2606605Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2606968Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2607463Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2607846Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2608369Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2608742Z getattr(self, test_name)() 2023-01-11T22:51:00.2609247Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2609611Z fn() 2023-01-11T22:51:00.2610083Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2610468Z test(self, **param_kwargs) 2023-01-11T22:51:00.2610974Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2611355Z return func(*args, **kwargs) 2023-01-11T22:51:00.2611738Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2612106Z self.run_subtests( 2023-01-11T22:51:00.2612598Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2612997Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2613538Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2613960Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2614506Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2614877Z output = model(*input) 2023-01-11T22:51:00.2615343Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2615720Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2616241Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2616924Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2617505Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2617894Z _lazy_init(state, module) 2023-01-11T22:51:00.2618477Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2618910Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2619507Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2619916Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2620427Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2620800Z return func(*args, **kwargs) 2023-01-11T22:51:00.2621329Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2621688Z p_assert( 2023-01-11T22:51:00.2622153Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2622526Z traceback.print_stack() 2023-01-11T22:51:00.2622906Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2623455Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2623847Z File "", line 1, in 2023-01-11T22:51:00.2624214Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2624566Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2624932Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2625302Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2625668Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2625992Z self.run() 2023-01-11T22:51:00.2626316Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2626659Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2627176Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2627564Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2628091Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2628459Z getattr(self, test_name)() 2023-01-11T22:51:00.2628733Z File "", line 1, in 2023-01-11T22:51:00.2629248Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2629593Z fn() 2023-01-11T22:51:00.2630080Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2630468Z test(self, **param_kwargs) 2023-01-11T22:51:00.2630806Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2631174Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2631701Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2632087Z return func(*args, **kwargs) 2023-01-11T22:51:00.2632422Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2632787Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2633212Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2633567Z self.run_subtests( 2023-01-11T22:51:00.2633923Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2634250Z self.run() 2023-01-11T22:51:00.2634720Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2635135Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2635582Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2635950Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2636475Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2636892Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2637405Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2637769Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2638305Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2638698Z output = model(*input) 2023-01-11T22:51:00.2639212Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2639581Z getattr(self, test_name)() 2023-01-11T22:51:00.2640061Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2640437Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2640992Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2641359Z fn() 2023-01-11T22:51:00.2641856Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2642298Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2642834Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2643228Z test(self, **param_kwargs) 2023-01-11T22:51:00.2643740Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2644109Z _lazy_init(state, module) 2023-01-11T22:51:00.2644620Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2645005Z return func(*args, **kwargs) 2023-01-11T22:51:00.2645550Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2646000Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2646485Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2646901Z self.run_subtests( 2023-01-11T22:51:00.2647458Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2647953Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2648511Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2649035Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2649581Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2649955Z return func(*args, **kwargs) 2023-01-11T22:51:00.2650465Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2650863Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2651413Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2651788Z p_assert( 2023-01-11T22:51:00.2652276Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2652662Z output = model(*input) 2023-01-11T22:51:00.2653148Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2653589Z traceback.print_stack() 2023-01-11T22:51:00.2654055Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2654442Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2654989Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2655424Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2655981Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2656364Z _lazy_init(state, module) 2023-01-11T22:51:00.2657149Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2657565Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2658159Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2658584Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2659160Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2659556Z return func(*args, **kwargs) 2023-01-11T22:51:00.2660091Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2660472Z p_assert( 2023-01-11T22:51:00.2660920Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2661294Z traceback.print_stack() 2023-01-11T22:51:00.2661682Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2662147Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2662529Z File "", line 1, in 2023-01-11T22:51:00.2662896Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2663268Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2663618Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2663983Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2664370Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2664685Z self.run() 2023-01-11T22:51:00.2665017Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2665374Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2665866Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2666252Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2666780Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2667169Z getattr(self, test_name)() 2023-01-11T22:51:00.2667663Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2668030Z fn() 2023-01-11T22:51:00.2668514Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2668638Z test(self, **param_kwargs) 2023-01-11T22:51:00.2668978Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2669102Z return func(*args, **kwargs) 2023-01-11T22:51:00.2669351Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2669464Z self.run_subtests( 2023-01-11T22:51:00.2669816Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2670066Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2670441Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2670596Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2670956Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2671079Z output = model(*input) 2023-01-11T22:51:00.2671403Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2671539Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2671921Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2672095Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2672466Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2672587Z _lazy_init(state, module) 2023-01-11T22:51:00.2672973Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2673150Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2673556Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2673701Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2674040Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2674165Z return func(*args, **kwargs) 2023-01-11T22:51:00.2674544Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2674650Z p_assert( 2023-01-11T22:51:00.2674968Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2675096Z traceback.print_stack() 2023-01-11T22:51:00.2675225Z File "", line 1, in 2023-01-11T22:51:00.2675436Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2675576Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2675777Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2675927Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2676121Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2676225Z self.run() 2023-01-11T22:51:00.2676427Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2676571Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2676915Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2677047Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2677414Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2677539Z getattr(self, test_name)() 2023-01-11T22:51:00.2677881Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2677982Z fn() 2023-01-11T22:51:00.2678345Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2678470Z test(self, **param_kwargs) 2023-01-11T22:51:00.2678821Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2678946Z return func(*args, **kwargs) 2023-01-11T22:51:00.2679262Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2679379Z self.run_subtests( 2023-01-11T22:51:00.2679723Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2679886Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2680252Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2680407Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2680781Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2680900Z output = model(*input) 2023-01-11T22:51:00.2681226Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2681367Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2681726Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2681949Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2682325Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2682446Z _lazy_init(state, module) 2023-01-11T22:51:00.2682802Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2682970Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2683372Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2683515Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2683836Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2683964Z return func(*args, **kwargs) 2023-01-11T22:51:00.2684345Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2684447Z p_assert( 2023-01-11T22:51:00.2684781Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2684907Z traceback.print_stack() 2023-01-11T22:51:00.2685147Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2685380Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2685493Z File "", line 1, in 2023-01-11T22:51:00.2685703Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2685845Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2686048Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2686198Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2686331Z File "", line 1, in 2023-01-11T22:51:00.2686544Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2686646Z self.run() 2023-01-11T22:51:00.2686832Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2686977Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2687183Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2687320Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2687667Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2687802Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2688007Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2688199Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2688572Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2688699Z getattr(self, test_name)() 2023-01-11T22:51:00.2688912Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2689016Z self.run() 2023-01-11T22:51:00.2689379Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2689480Z fn() 2023-01-11T22:51:00.2689684Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2689813Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2690182Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2690307Z test(self, **param_kwargs) 2023-01-11T22:51:00.2690644Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2690825Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2691199Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2691326Z return func(*args, **kwargs) 2023-01-11T22:51:00.2691684Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2691791Z getattr(self, test_name)() 2023-01-11T22:51:00.2692099Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2692218Z self.run_subtests( 2023-01-11T22:51:00.2692582Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2692685Z fn() 2023-01-11T22:51:00.2693038Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2693202Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2693571Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2693678Z test(self, **param_kwargs) 2023-01-11T22:51:00.2694035Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2694188Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2694545Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2694668Z return func(*args, **kwargs) 2023-01-11T22:51:00.2695041Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2695162Z output = model(*input) 2023-01-11T22:51:00.2695412Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2695512Z self.run_subtests( 2023-01-11T22:51:00.2695839Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2695976Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2696326Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2696488Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2697118Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2697300Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2697669Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2697912Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2698290Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2698412Z _lazy_init(state, module) 2023-01-11T22:51:00.2698792Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2698913Z output = model(*input) 2023-01-11T22:51:00.2699267Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2699435Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2699760Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2699881Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2700286Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2700429Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2700865Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2701050Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2701395Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2701523Z return func(*args, **kwargs) 2023-01-11T22:51:00.2701894Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2701998Z _lazy_init(state, module) 2023-01-11T22:51:00.2702378Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2702486Z p_assert( 2023-01-11T22:51:00.2702840Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2703010Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2703347Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2703474Z traceback.print_stack() 2023-01-11T22:51:00.2703869Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2703994Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2704331Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2704453Z return func(*args, **kwargs) 2023-01-11T22:51:00.2704834Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2704938Z p_assert( 2023-01-11T22:51:00.2705270Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2705398Z traceback.print_stack() 2023-01-11T22:51:00.2705635Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2705854Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2705984Z File "", line 1, in 2023-01-11T22:51:00.2706197Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2706339Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2706542Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2706692Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2706903Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2707070Z self.run() 2023-01-11T22:51:00.2707254Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2707405Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2707754Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2707891Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2708256Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2708381Z getattr(self, test_name)() 2023-01-11T22:51:00.2708742Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2708823Z fn() 2023-01-11T22:51:00.2709191Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2709317Z test(self, **param_kwargs) 2023-01-11T22:51:00.2709670Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2709841Z return func(*args, **kwargs) 2023-01-11T22:51:00.2710103Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2710220Z self.run_subtests( 2023-01-11T22:51:00.2710576Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2710722Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2711087Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2711236Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2711613Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2711736Z output = model(*input) 2023-01-11T22:51:00.2712060Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2712201Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2712575Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2712734Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2713099Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2713220Z _lazy_init(state, module) 2023-01-11T22:51:00.2713570Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2713741Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2714142Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2714285Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2714625Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2714750Z return func(*args, **kwargs) 2023-01-11T22:51:00.2715111Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2715213Z p_assert( 2023-01-11T22:51:00.2715547Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2715673Z traceback.print_stack() 2023-01-11T22:51:00.2715801Z File "", line 1, in 2023-01-11T22:51:00.2716010Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2716151Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2716393Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2716546Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2716764Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2716868Z self.run() 2023-01-11T22:51:00.2717071Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2717221Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2717564Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2717696Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2718040Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2718164Z getattr(self, test_name)() 2023-01-11T22:51:00.2718528Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2718631Z fn() 2023-01-11T22:51:00.2719045Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2719176Z test(self, **param_kwargs) 2023-01-11T22:51:00.2719536Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2719660Z return func(*args, **kwargs) 2023-01-11T22:51:00.2719893Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2720009Z self.run_subtests( 2023-01-11T22:51:00.2720363Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2720526Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2720896Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2721052Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2721427Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2721547Z output = model(*input) 2023-01-11T22:51:00.2721855Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2721991Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2722364Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2722537Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2722902Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2723024Z _lazy_init(state, module) 2023-01-11T22:51:00.2723381Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2723551Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2723936Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2724079Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2724416Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2724540Z return func(*args, **kwargs) 2023-01-11T22:51:00.2724919Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2725021Z p_assert( 2023-01-11T22:51:00.2725356Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2725567Z traceback.print_stack() 2023-01-11T22:51:00.2725788Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2726030Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2726163Z File "", line 1, in 2023-01-11T22:51:00.2726372Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2726512Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2726713Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2726863Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2727075Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2727161Z self.run() 2023-01-11T22:51:00.2727361Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2727505Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2727853Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2727986Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2728400Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2728531Z getattr(self, test_name)() 2023-01-11T22:51:00.2728642Z File "", line 1, in 2023-01-11T22:51:00.2729005Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2729103Z fn() 2023-01-11T22:51:00.2729467Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2729589Z test(self, **param_kwargs) 2023-01-11T22:51:00.2729798Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2729944Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2730306Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2730416Z return func(*args, **kwargs) 2023-01-11T22:51:00.2730620Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2730774Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2731024Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2731139Z self.run_subtests( 2023-01-11T22:51:00.2731354Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2731456Z self.run() 2023-01-11T22:51:00.2731808Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2731953Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2732160Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2732305Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2732678Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2732830Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2733167Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2733300Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2733673Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2733776Z output = model(*input) 2023-01-11T22:51:00.2734134Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2734256Z getattr(self, test_name)() 2023-01-11T22:51:00.2734642Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2734783Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2735147Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2735248Z fn() 2023-01-11T22:51:00.2735604Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2735778Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2736141Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2736263Z test(self, **param_kwargs) 2023-01-11T22:51:00.2736861Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2737004Z _lazy_init(state, module) 2023-01-11T22:51:00.2737383Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2737598Z return func(*args, **kwargs) 2023-01-11T22:51:00.2737951Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2738122Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2738374Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2738488Z self.run_subtests( 2023-01-11T22:51:00.2738888Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2739032Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2739384Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2739550Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2739890Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2739998Z return func(*args, **kwargs) 2023-01-11T22:51:00.2740361Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2740514Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2740893Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2740997Z p_assert( 2023-01-11T22:51:00.2741371Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2741490Z output = model(*input) 2023-01-11T22:51:00.2741825Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2741937Z traceback.print_stack() 2023-01-11T22:51:00.2742263Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2742400Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2742777Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2742953Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2743317Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2743438Z _lazy_init(state, module) 2023-01-11T22:51:00.2743790Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2743942Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2744426Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2744568Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2744912Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2745038Z return func(*args, **kwargs) 2023-01-11T22:51:00.2745415Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2745517Z p_assert( 2023-01-11T22:51:00.2745853Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2745962Z traceback.print_stack() 2023-01-11T22:51:00.2746198Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2746435Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2747247Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2747995Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2748749Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2749494Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2750241Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2750970Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2751717Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2752448Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2752579Z File "", line 1, in 2023-01-11T22:51:00.2752792Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2752936Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2753139Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2753328Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2753543Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2753653Z self.run() 2023-01-11T22:51:00.2753859Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2754010Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2754355Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2754490Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2754836Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2754960Z getattr(self, test_name)() 2023-01-11T22:51:00.2755321Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2755426Z fn() 2023-01-11T22:51:00.2755795Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2755917Z test(self, **param_kwargs) 2023-01-11T22:51:00.2756363Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2756498Z return func(*args, **kwargs) 2023-01-11T22:51:00.2756734Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2756849Z self.run_subtests( 2023-01-11T22:51:00.2757206Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2757367Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2757734Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2757895Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2758274Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2758397Z output = model(*input) 2023-01-11T22:51:00.2758707Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2758846Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2759225Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2759403Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2759777Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2759901Z _lazy_init(state, module) 2023-01-11T22:51:00.2760253Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2760425Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2760827Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2760953Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2761290Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2761415Z return func(*args, **kwargs) 2023-01-11T22:51:00.2761793Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2761895Z p_assert( 2023-01-11T22:51:00.2762229Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2762354Z traceback.print_stack() 2023-01-11T22:51:00.2762466Z File "", line 1, in 2023-01-11T22:51:00.2762748Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2762893Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2763099Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2763253Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2763467Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2763570Z self.run() 2023-01-11T22:51:00.2763767Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2763896Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2764236Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2764372Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2764740Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2764868Z getattr(self, test_name)() 2023-01-11T22:51:00.2765276Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2765380Z fn() 2023-01-11T22:51:00.2765753Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2765858Z test(self, **param_kwargs) 2023-01-11T22:51:00.2766211Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2766334Z return func(*args, **kwargs) 2023-01-11T22:51:00.2766584Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2766696Z self.run_subtests( 2023-01-11T22:51:00.2767046Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2767213Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2767579Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2767715Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2768089Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2768207Z output = model(*input) 2023-01-11T22:51:00.2768532Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2768671Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2769047Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2769221Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2769586Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2769694Z _lazy_init(state, module) 2023-01-11T22:51:00.2770049Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2770218Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2770618Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2770760Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2771097Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2771222Z return func(*args, **kwargs) 2023-01-11T22:51:00.2771601Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2771743Z p_assert( 2023-01-11T22:51:00.2772083Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2772212Z traceback.print_stack() 2023-01-11T22:51:00.2772456Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2772691Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2772820Z File "", line 1, in 2023-01-11T22:51:00.2772945Z File "", line 1, in 2023-01-11T22:51:00.2773158Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2773282Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2773482Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2773632Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2773843Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2773985Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2774198Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2774348Z self.run() 2023-01-11T22:51:00.2774540Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2774692Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2774893Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2775041Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2775256Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2775363Z self.run() 2023-01-11T22:51:00.2775569Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2775715Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2776043Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2776186Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2776529Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2776862Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2777241Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2777367Z getattr(self, test_name)() 2023-01-11T22:51:00.2777731Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2777855Z getattr(self, test_name)() 2023-01-11T22:51:00.2778199Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2778297Z fn() 2023-01-11T22:51:00.2778653Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2778754Z fn() 2023-01-11T22:51:00.2779123Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2779244Z test(self, **param_kwargs) 2023-01-11T22:51:00.2779608Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2779712Z test(self, **param_kwargs) 2023-01-11T22:51:00.2780068Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2780194Z return func(*args, **kwargs) 2023-01-11T22:51:00.2780552Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2780674Z return func(*args, **kwargs) 2023-01-11T22:51:00.2780925Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2781130Z self.run_subtests( 2023-01-11T22:51:00.2781383Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2781477Z self.run_subtests( 2023-01-11T22:51:00.2781836Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2781998Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2782342Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2782502Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2782869Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2783024Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2783387Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2783520Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2783956Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2784086Z output = model(*input) 2023-01-11T22:51:00.2784470Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2784591Z output = model(*input) 2023-01-11T22:51:00.2784920Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2785058Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2785380Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2785498Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2785882Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2786056Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2786430Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2786604Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2786968Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2787090Z _lazy_init(state, module) 2023-01-11T22:51:00.2787456Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2787575Z _lazy_init(state, module) 2023-01-11T22:51:00.2787913Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2788083Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2788440Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2788607Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2789007Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2789149Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2789548Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2789688Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2790009Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2790136Z return func(*args, **kwargs) 2023-01-11T22:51:00.2790538Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2790670Z return func(*args, **kwargs) 2023-01-11T22:51:00.2791054Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2791156Z p_assert( 2023-01-11T22:51:00.2791528Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2791626Z p_assert( 2023-01-11T22:51:00.2792003Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2792135Z traceback.print_stack() 2023-01-11T22:51:00.2792476Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2792602Z traceback.print_stack() 2023-01-11T22:51:00.2792839Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2793076Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2793257Z File "", line 1, in 2023-01-11T22:51:00.2793478Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2793605Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2793808Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2793960Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2794174Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2794277Z self.run() 2023-01-11T22:51:00.2794479Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2794624Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2794952Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2795093Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2795465Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2795589Z getattr(self, test_name)() 2023-01-11T22:51:00.2795950Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2796047Z fn() 2023-01-11T22:51:00.2796411Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2796532Z test(self, **param_kwargs) 2023-01-11T22:51:00.2796867Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2796990Z return func(*args, **kwargs) 2023-01-11T22:51:00.2797239Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2797356Z self.run_subtests( 2023-01-11T22:51:00.2797711Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2797872Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2798235Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2798386Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2798744Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2798862Z output = model(*input) 2023-01-11T22:51:00.2799188Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2799326Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2799764Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2799943Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2800315Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2800440Z _lazy_init(state, module) 2023-01-11T22:51:00.2800776Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2800946Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2801347Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2801490Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2801829Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2801957Z return func(*args, **kwargs) 2023-01-11T22:51:00.2802338Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2802487Z p_assert( 2023-01-11T22:51:00.2802816Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2802946Z traceback.print_stack() 2023-01-11T22:51:00.2803081Z File "", line 1, in 2023-01-11T22:51:00.2803295Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2803439Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2803641Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2803791Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2804003Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2804094Z self.run() 2023-01-11T22:51:00.2804294Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2804437Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2804783Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2804916Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2805278Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2805400Z getattr(self, test_name)() 2023-01-11T22:51:00.2805759Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2805840Z fn() 2023-01-11T22:51:00.2806204Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2806326Z test(self, **param_kwargs) 2023-01-11T22:51:00.2806688Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2806812Z return func(*args, **kwargs) 2023-01-11T22:51:00.2807064Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2807179Z self.run_subtests( 2023-01-11T22:51:00.2807515Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2807676Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2808039Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2808190Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2808566Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2808745Z output = model(*input) 2023-01-11T22:51:00.2809078Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2809219Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2809582Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2809758Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2810124Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2810244Z _lazy_init(state, module) 2023-01-11T22:51:00.2810596Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2810762Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2811161Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2811307Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2811689Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2811803Z return func(*args, **kwargs) 2023-01-11T22:51:00.2812190Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2812297Z p_assert( 2023-01-11T22:51:00.2812636Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2812764Z traceback.print_stack() 2023-01-11T22:51:00.2813004Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2813247Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2814009Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2814757Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2814873Z File "", line 1, in 2023-01-11T22:51:00.2815085Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2815229Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2815432Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2815582Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2815797Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2815901Z self.run() 2023-01-11T22:51:00.2816110Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2816239Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2816818Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2816967Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2817348Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2817474Z getattr(self, test_name)() 2023-01-11T22:51:00.2817835Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2817936Z fn() 2023-01-11T22:51:00.2818303Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2818500Z test(self, **param_kwargs) 2023-01-11T22:51:00.2818862Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2818990Z return func(*args, **kwargs) 2023-01-11T22:51:00.2819244Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2819361Z self.run_subtests( 2023-01-11T22:51:00.2819716Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2819882Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2820247Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2820384Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2820760Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2820880Z output = model(*input) 2023-01-11T22:51:00.2821263Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2821413Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2821797Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2821975Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2822343Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2822447Z _lazy_init(state, module) 2023-01-11T22:51:00.2822801Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2822975Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2823376Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2823520Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2823857Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2823983Z return func(*args, **kwargs) 2023-01-11T22:51:00.2824360Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2824446Z p_assert( 2023-01-11T22:51:00.2824780Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2824905Z traceback.print_stack() 2023-01-11T22:51:00.2825033Z File "", line 1, in 2023-01-11T22:51:00.2825240Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2825386Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2825588Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2825741Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2825939Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2826042Z self.run() 2023-01-11T22:51:00.2826243Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2826386Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2826728Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2826860Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2827222Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2827328Z getattr(self, test_name)() 2023-01-11T22:51:00.2827763Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2827865Z fn() 2023-01-11T22:51:00.2828237Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2828359Z test(self, **param_kwargs) 2023-01-11T22:51:00.2828719Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2828846Z return func(*args, **kwargs) 2023-01-11T22:51:00.2829095Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2829192Z self.run_subtests( 2023-01-11T22:51:00.2829543Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2829704Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2830072Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2830271Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2830660Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2830783Z output = model(*input) 2023-01-11T22:51:00.2831109Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2831229Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2831605Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2831779Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2832144Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2832268Z _lazy_init(state, module) 2023-01-11T22:51:00.2832623Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2832793Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2833193Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2833317Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2833656Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2833780Z return func(*args, **kwargs) 2023-01-11T22:51:00.2834159Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2834260Z p_assert( 2023-01-11T22:51:00.2834597Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2834724Z traceback.print_stack() 2023-01-11T22:51:00.2834963Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2835182Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2835415Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2835646Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2835873Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2836094Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2836321Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2836546Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2836836Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2837067Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2837274Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2837502Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2837727Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2837952Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2838173Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2838394Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2838621Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2838846Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2839102Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2839332Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2840101Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2840858Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2841603Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2842347Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2843079Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2843827Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2844867Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:224: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:00.2845235Z local_num_valid_indices = torch.tensor([num_valid_indices], **tensor_kwargs) # type: ignore[arg-type, call-overload] 2023-01-11T22:51:00.2845526Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2845765Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2845999Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2846232Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2846465Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2846672Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2846906Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2847133Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2847358Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2847586Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2847852Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2848086Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2848314Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2848538Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2848745Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2848970Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2849193Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2849418Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2849645Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2849868Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2850089Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2850312Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2850521Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2850745Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2851508Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2852250Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2853000Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2853731Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2854537Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2855273Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2856017Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2857311Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2858883Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2859764Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2860512Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2861243Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2861982Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2862721Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2863462Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2864188Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2865028Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2865756Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2866495Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2867347Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2868099Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2868825Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.2869069Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2869309Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2869539Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2869774Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2869886Z dist init r=0, world=2 2023-01-11T22:51:00.2870203Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.2870525Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.2870854Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.2871180Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.2871494Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.2871820Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.2872142Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.2872454Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.2872845Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.2873163Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.2873472Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.2873778Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.2873892Z dist init r=1, world=2 2023-01-11T22:51:00.2874203Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.2874553Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.2874876Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.2875188Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.2875497Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.2875821Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.2876142Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.2876454Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.2876773Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.2877084Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.2877374Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.2877691Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.2877796Z ok (35.958s) 2023-01-11T22:51:00.2878018Z test_delayed_optim_step_offload_true_shard_grad_op (__main__.TestParityWithDDP) 2023-01-11T22:51:00.2878335Z Tests the FSDP forward, backward, and optimizer step runtime by ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90091 2023-01-11T22:51:00.2878555Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 90092 2023-01-11T22:51:00.2878939Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.2879119Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.2879505Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.2879679Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.2880113Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.2880295Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.2880674Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.2880864Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.2881113Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.2881361Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.2881764Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.2882162Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.2882379Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.2882652Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.2882898Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2883130Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2884161Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.2884278Z warnings.warn( 2023-01-11T22:51:00.2885291Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.2885405Z warnings.warn( 2023-01-11T22:51:00.2885533Z File "", line 1, in 2023-01-11T22:51:00.2885748Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2885874Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2886078Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2886227Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2886444Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2886546Z self.run() 2023-01-11T22:51:00.2886748Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2886893Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2887240Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2887358Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2887722Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2887844Z getattr(self, test_name)() 2023-01-11T22:51:00.2888210Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2888314Z fn() 2023-01-11T22:51:00.2888684Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2888882Z test(self, **param_kwargs) 2023-01-11T22:51:00.2889252Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2889363Z return func(*args, **kwargs) 2023-01-11T22:51:00.2889617Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2889732Z self.run_subtests( 2023-01-11T22:51:00.2890092Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2890258Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2890623Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2890774Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2891150Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2891256Z output = model(*input) 2023-01-11T22:51:00.2891627Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2891777Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2892213Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2892391Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2892763Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2892886Z _lazy_init(state, module) 2023-01-11T22:51:00.2893241Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2893391Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2893800Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2893945Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2894285Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2894410Z return func(*args, **kwargs) 2023-01-11T22:51:00.2894792Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2894893Z p_assert( 2023-01-11T22:51:00.2895230Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2895339Z traceback.print_stack() 2023-01-11T22:51:00.2895467Z File "", line 1, in 2023-01-11T22:51:00.2895675Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2895821Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2896023Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2896180Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2896392Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2896478Z self.run() 2023-01-11T22:51:00.2896933Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2897087Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2897437Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2897572Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2897937Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2898063Z getattr(self, test_name)() 2023-01-11T22:51:00.2898565Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2898646Z fn() 2023-01-11T22:51:00.2899023Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2899145Z test(self, **param_kwargs) 2023-01-11T22:51:00.2899503Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2899629Z return func(*args, **kwargs) 2023-01-11T22:51:00.2899879Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2899992Z self.run_subtests( 2023-01-11T22:51:00.2900348Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2900494Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2900866Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2901016Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2901483Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2901615Z output = model(*input) 2023-01-11T22:51:00.2901949Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2902089Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2902468Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2902625Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2902993Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2903119Z _lazy_init(state, module) 2023-01-11T22:51:00.2903473Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2903642Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2904041Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2904185Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2904523Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2904630Z return func(*args, **kwargs) 2023-01-11T22:51:00.2905012Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2905112Z p_assert( 2023-01-11T22:51:00.2905449Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2905579Z traceback.print_stack() 2023-01-11T22:51:00.2905815Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2906053Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2906183Z File "", line 1, in 2023-01-11T22:51:00.2906377Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2906518Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2906719Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2906870Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2907082Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2907184Z self.run() 2023-01-11T22:51:00.2907384Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2907594Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2907921Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2908063Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2908430Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2908556Z getattr(self, test_name)() 2023-01-11T22:51:00.2908919Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2909018Z fn() 2023-01-11T22:51:00.2909384Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2909506Z test(self, **param_kwargs) 2023-01-11T22:51:00.2909844Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2909974Z return func(*args, **kwargs) 2023-01-11T22:51:00.2910223Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2910380Z self.run_subtests( 2023-01-11T22:51:00.2910746Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2910913Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2911274Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2911425Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2911783Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2911902Z output = model(*input) 2023-01-11T22:51:00.2912226Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2912368Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2912748Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2912924Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2913290Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2913411Z _lazy_init(state, module) 2023-01-11T22:51:00.2913747Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2913916Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2914314Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2914455Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2914797Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2914922Z return func(*args, **kwargs) 2023-01-11T22:51:00.2915303Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2915405Z p_assert( 2023-01-11T22:51:00.2915725Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2915850Z traceback.print_stack() 2023-01-11T22:51:00.2915978Z File "", line 1, in 2023-01-11T22:51:00.2916186Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2916326Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2916528Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2916679Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2916932Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2917040Z self.run() 2023-01-11T22:51:00.2917249Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2917398Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2917741Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2917876Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2918238Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2918363Z getattr(self, test_name)() 2023-01-11T22:51:00.2918704Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2918802Z fn() 2023-01-11T22:51:00.2919169Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2919294Z test(self, **param_kwargs) 2023-01-11T22:51:00.2919702Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2919835Z return func(*args, **kwargs) 2023-01-11T22:51:00.2920091Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2920207Z self.run_subtests( 2023-01-11T22:51:00.2920547Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2920708Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2921073Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2921225Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2921606Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2921725Z output = model(*input) 2023-01-11T22:51:00.2922053Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2922191Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2922551Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2922724Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2923090Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2923210Z _lazy_init(state, module) 2023-01-11T22:51:00.2923564Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2923732Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2924134Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2924280Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2924602Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2924726Z return func(*args, **kwargs) 2023-01-11T22:51:00.2925106Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2925208Z p_assert( 2023-01-11T22:51:00.2925546Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2925672Z traceback.print_stack() 2023-01-11T22:51:00.2925910Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2926204Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2926316Z File "", line 1, in 2023-01-11T22:51:00.2926534Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2926679Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2926884Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2927033Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2927246Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2927349Z self.run() 2023-01-11T22:51:00.2927547Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2927675Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2928020Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2928156Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2928524Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2928695Z getattr(self, test_name)() 2023-01-11T22:51:00.2929067Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2929168Z fn() 2023-01-11T22:51:00.2929514Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2929639Z test(self, **param_kwargs) 2023-01-11T22:51:00.2929995Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2930119Z return func(*args, **kwargs) 2023-01-11T22:51:00.2930368Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2930485Z self.run_subtests( 2023-01-11T22:51:00.2930836Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2931001Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2931349Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2931501Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2931873Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2931991Z output = model(*input) 2023-01-11T22:51:00.2932315Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2932453Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2932826Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2933004Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2933371Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2933476Z _lazy_init(state, module) 2023-01-11T22:51:00.2933830Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2933998Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2934397Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2934540Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2934877Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2935000Z return func(*args, **kwargs) 2023-01-11T22:51:00.2935376Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2935518Z p_assert( 2023-01-11T22:51:00.2935862Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2935992Z traceback.print_stack() 2023-01-11T22:51:00.2936123Z File "", line 1, in 2023-01-11T22:51:00.2936334Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2936478Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2936915Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2937057Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2937273Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2937379Z self.run() 2023-01-11T22:51:00.2937580Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2937731Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2938159Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2938302Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2938668Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2938775Z getattr(self, test_name)() 2023-01-11T22:51:00.2939133Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2939232Z fn() 2023-01-11T22:51:00.2939601Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2939724Z test(self, **param_kwargs) 2023-01-11T22:51:00.2940078Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2940206Z return func(*args, **kwargs) 2023-01-11T22:51:00.2940456Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2940556Z self.run_subtests( 2023-01-11T22:51:00.2940910Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2941071Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2941435Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2941587Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2941962Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2942081Z output = model(*input) 2023-01-11T22:51:00.2942404Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2942528Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2942907Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2943082Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2943448Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2943568Z _lazy_init(state, module) 2023-01-11T22:51:00.2943919Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2944086Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2944486Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2944612Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2945029Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2945157Z return func(*args, **kwargs) 2023-01-11T22:51:00.2945543Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2945645Z p_assert( 2023-01-11T22:51:00.2945980Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2946105Z traceback.print_stack() 2023-01-11T22:51:00.2946342Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2946561Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2946690Z File "", line 1, in 2023-01-11T22:51:00.2946897Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2947045Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2947245Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2947442Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2947665Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2947769Z self.run() 2023-01-11T22:51:00.2947954Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2948102Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2948446Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2948578Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2948941Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2949067Z getattr(self, test_name)() 2023-01-11T22:51:00.2949434Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2949516Z fn() 2023-01-11T22:51:00.2949885Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2950007Z test(self, **param_kwargs) 2023-01-11T22:51:00.2950363Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2950485Z return func(*args, **kwargs) 2023-01-11T22:51:00.2950736Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2950849Z self.run_subtests( 2023-01-11T22:51:00.2951203Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2951347Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2951714Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2951865Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2952243Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2952362Z output = model(*input) 2023-01-11T22:51:00.2952688Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2952825Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2953200Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2953356Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2953720Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2953914Z _lazy_init(state, module) 2023-01-11T22:51:00.2954273Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2954450Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2954854Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2955001Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2955343Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2955471Z return func(*args, **kwargs) 2023-01-11T22:51:00.2955832Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2955934Z p_assert( 2023-01-11T22:51:00.2956267Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2956399Z traceback.print_stack() 2023-01-11T22:51:00.2956528Z File "", line 1, in 2023-01-11T22:51:00.2956783Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2956934Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2957118Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2957271Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2957486Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2957592Z self.run() 2023-01-11T22:51:00.2957794Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2957944Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2958286Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2958425Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2958773Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2958901Z getattr(self, test_name)() 2023-01-11T22:51:00.2959261Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2959360Z fn() 2023-01-11T22:51:00.2959724Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2959846Z test(self, **param_kwargs) 2023-01-11T22:51:00.2960197Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2960320Z return func(*args, **kwargs) 2023-01-11T22:51:00.2960555Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2960671Z self.run_subtests( 2023-01-11T22:51:00.2961028Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2961192Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2961556Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2961709Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2962083Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2962203Z output = model(*input) 2023-01-11T22:51:00.2962512Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2962647Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2963023Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2963255Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2963631Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2963755Z _lazy_init(state, module) 2023-01-11T22:51:00.2964109Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2964276Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2964658Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2964800Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2965138Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2965262Z return func(*args, **kwargs) 2023-01-11T22:51:00.2965645Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2965747Z p_assert( 2023-01-11T22:51:00.2966129Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2966265Z traceback.print_stack() 2023-01-11T22:51:00.2966485Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2966732Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2966863Z File "", line 1, in 2023-01-11T22:51:00.2966992Z File "", line 1, in 2023-01-11T22:51:00.2967208Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2967350Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2967552Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2967689Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2967897Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2968039Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2968253Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2968357Z self.run() 2023-01-11T22:51:00.2968557Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2968706Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2968905Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2969033Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2969246Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2969348Z self.run() 2023-01-11T22:51:00.2969696Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2969835Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2970041Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2970191Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2970557Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2970664Z getattr(self, test_name)() 2023-01-11T22:51:00.2970999Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2971130Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2971492Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2971594Z fn() 2023-01-11T22:51:00.2971952Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2972131Z getattr(self, test_name)() 2023-01-11T22:51:00.2972484Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2972612Z test(self, **param_kwargs) 2023-01-11T22:51:00.2972973Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2973072Z fn() 2023-01-11T22:51:00.2973429Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2973555Z return func(*args, **kwargs) 2023-01-11T22:51:00.2973919Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2974040Z test(self, **param_kwargs) 2023-01-11T22:51:00.2974272Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2974389Z self.run_subtests( 2023-01-11T22:51:00.2974749Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2974920Z return func(*args, **kwargs) 2023-01-11T22:51:00.2975284Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2975449Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2975701Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2975813Z self.run_subtests( 2023-01-11T22:51:00.2976160Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2976314Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2976891Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2977068Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2977460Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2977587Z output = model(*input) 2023-01-11T22:51:00.2977954Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2978106Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2978414Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2978552Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2978923Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2979040Z output = model(*input) 2023-01-11T22:51:00.2979417Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2979596Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2979928Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2980067Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2980416Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2980537Z _lazy_init(state, module) 2023-01-11T22:51:00.2980910Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2981082Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2981435Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2981693Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2982069Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2982197Z _lazy_init(state, module) 2023-01-11T22:51:00.2982599Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2982725Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2983081Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2983247Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2983585Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2983708Z return func(*args, **kwargs) 2023-01-11T22:51:00.2984109Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2984254Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2984690Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2984782Z p_assert( 2023-01-11T22:51:00.2985126Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2985254Z return func(*args, **kwargs) 2023-01-11T22:51:00.2985591Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2985721Z traceback.print_stack() 2023-01-11T22:51:00.2986100Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2986202Z p_assert( 2023-01-11T22:51:00.2986534Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2986647Z traceback.print_stack() 2023-01-11T22:51:00.2986884Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2987122Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.2987252Z File "", line 1, in 2023-01-11T22:51:00.2987460Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2987603Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2987802Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2987935Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2988150Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2988252Z self.run() 2023-01-11T22:51:00.2988455Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2988603Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2988945Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2989083Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2989449Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2989555Z getattr(self, test_name)() 2023-01-11T22:51:00.2989917Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2990017Z fn() 2023-01-11T22:51:00.2990380Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.2990501Z test(self, **param_kwargs) 2023-01-11T22:51:00.2990855Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.2991040Z return func(*args, **kwargs) 2023-01-11T22:51:00.2991296Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.2991395Z self.run_subtests( 2023-01-11T22:51:00.2991752Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.2991913Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.2992330Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.2992487Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.2992867Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.2992990Z output = model(*input) 2023-01-11T22:51:00.2993318Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.2993444Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.2993870Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.2994053Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.2994424Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.2994547Z _lazy_init(state, module) 2023-01-11T22:51:00.2994902Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.2995075Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.2995474Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.2995598Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.2995941Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.2996068Z return func(*args, **kwargs) 2023-01-11T22:51:00.2996448Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.2996550Z p_assert( 2023-01-11T22:51:00.2996881Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.2997008Z traceback.print_stack() 2023-01-11T22:51:00.2997136Z File "", line 1, in 2023-01-11T22:51:00.2997328Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.2997469Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.2997669Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.2997820Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.2998036Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.2998139Z self.run() 2023-01-11T22:51:00.2998346Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.2998474Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.2998816Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.2998950Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.2999315Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.2999439Z getattr(self, test_name)() 2023-01-11T22:51:00.2999802Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.2999900Z fn() 2023-01-11T22:51:00.3000266Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3000429Z test(self, **param_kwargs) 2023-01-11T22:51:00.3000797Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3000923Z return func(*args, **kwargs) 2023-01-11T22:51:00.3001172Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.3001285Z self.run_subtests( 2023-01-11T22:51:00.3001638Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3001799Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3002165Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3002299Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3002678Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3002797Z output = model(*input) 2023-01-11T22:51:00.3003167Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3003313Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3003697Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3003873Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3004238Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3004342Z _lazy_init(state, module) 2023-01-11T22:51:00.3004704Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3004878Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3005291Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3005421Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3005760Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3005883Z return func(*args, **kwargs) 2023-01-11T22:51:00.3006260Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3006361Z p_assert( 2023-01-11T22:51:00.3006698Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3006826Z traceback.print_stack() 2023-01-11T22:51:00.3007065Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3007288Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3007419Z File "", line 1, in 2023-01-11T22:51:00.3007632Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3007776Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3007980Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3008129Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3008342Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3008428Z self.run() 2023-01-11T22:51:00.3008629Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3008775Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3009117Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3009307Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3009676Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3009806Z getattr(self, test_name)() 2023-01-11T22:51:00.3010170Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3010250Z fn() 2023-01-11T22:51:00.3010615Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3010739Z test(self, **param_kwargs) 2023-01-11T22:51:00.3011097Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3011229Z return func(*args, **kwargs) 2023-01-11T22:51:00.3011478Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.3011594Z self.run_subtests( 2023-01-11T22:51:00.3011947Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3012137Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3012513Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3012666Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3013045Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3013167Z output = model(*input) 2023-01-11T22:51:00.3013495Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3013635Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3014012Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3014175Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3014545Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3014669Z _lazy_init(state, module) 2023-01-11T22:51:00.3015025Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3015197Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3015595Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3015743Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3016080Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3016186Z return func(*args, **kwargs) 2023-01-11T22:51:00.3016796Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3016914Z p_assert( 2023-01-11T22:51:00.3017266Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3017395Z traceback.print_stack() 2023-01-11T22:51:00.3017524Z File "", line 1, in 2023-01-11T22:51:00.3017736Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3017877Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3018059Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3018210Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3018425Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3018530Z self.run() 2023-01-11T22:51:00.3018734Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3018981Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3019330Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3019465Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3019811Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3019935Z getattr(self, test_name)() 2023-01-11T22:51:00.3020300Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3020396Z fn() 2023-01-11T22:51:00.3020767Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3020890Z test(self, **param_kwargs) 2023-01-11T22:51:00.3021250Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3021362Z return func(*args, **kwargs) 2023-01-11T22:51:00.3021674Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.3021797Z self.run_subtests( 2023-01-11T22:51:00.3022153Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3022317Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3022687Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3022841Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3023215Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3023337Z output = model(*input) 2023-01-11T22:51:00.3023648Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3023792Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3024176Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3024354Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3024723Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3024844Z _lazy_init(state, module) 2023-01-11T22:51:00.3025198Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3025371Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3025756Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3025904Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3026246Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3026372Z return func(*args, **kwargs) 2023-01-11T22:51:00.3026751Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3026858Z p_assert( 2023-01-11T22:51:00.3027196Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3027326Z traceback.print_stack() 2023-01-11T22:51:00.3027547Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3027786Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3027917Z File "", line 1, in 2023-01-11T22:51:00.3028129Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3028328Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3028532Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3028687Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3028884Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3028989Z self.run() 2023-01-11T22:51:00.3029190Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3029334Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3029680Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3029812Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3030178Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3030301Z getattr(self, test_name)() 2023-01-11T22:51:00.3030648Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3030745Z fn() 2023-01-11T22:51:00.3031159Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3031290Z test(self, **param_kwargs) 2023-01-11T22:51:00.3031650Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3031779Z return func(*args, **kwargs) 2023-01-11T22:51:00.3032033Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.3032149Z self.run_subtests( 2023-01-11T22:51:00.3032483Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3032646Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3033014Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3033170Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3033547Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3033669Z output = model(*input) 2023-01-11T22:51:00.3033994Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3034131Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3034490Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3034669Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3035039Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3035164Z _lazy_init(state, module) 2023-01-11T22:51:00.3035516Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3035690Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3036093Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3036234Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3036554Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3036677Z return func(*args, **kwargs) 2023-01-11T22:51:00.3037062Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3037162Z p_assert( 2023-01-11T22:51:00.3037502Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3037685Z traceback.print_stack() 2023-01-11T22:51:00.3037815Z File "", line 1, in 2023-01-11T22:51:00.3038032Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3038158Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3038362Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3038513Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3038726Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3038831Z self.run() 2023-01-11T22:51:00.3039031Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3039176Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3039504Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3039640Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3040002Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3040207Z getattr(self, test_name)() 2023-01-11T22:51:00.3040579Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3040678Z fn() 2023-01-11T22:51:00.3041046Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3041170Z test(self, **param_kwargs) 2023-01-11T22:51:00.3041509Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3041635Z return func(*args, **kwargs) 2023-01-11T22:51:00.3041887Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.3042006Z self.run_subtests( 2023-01-11T22:51:00.3042362Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3042528Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3042897Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3043050Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3043409Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3043530Z output = model(*input) 2023-01-11T22:51:00.3043858Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3043997Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3044378Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3044554Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3044925Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3045047Z _lazy_init(state, module) 2023-01-11T22:51:00.3045383Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3045554Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3045954Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3046095Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3046436Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3046563Z return func(*args, **kwargs) 2023-01-11T22:51:00.3047003Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3047107Z p_assert( 2023-01-11T22:51:00.3047446Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3047558Z traceback.print_stack() 2023-01-11T22:51:00.3047798Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3048035Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3048796Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3049597Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3050359Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3051092Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3051843Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3052583Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3053335Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3054071Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3054206Z File "", line 1, in 2023-01-11T22:51:00.3054420Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3054564Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3054769Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3054903Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3055114Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3055219Z self.run() 2023-01-11T22:51:00.3055421Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3055567Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3055972Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3056107Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3056462Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3056810Z getattr(self, test_name)() 2023-01-11T22:51:00.3057190Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3057292Z fn() 2023-01-11T22:51:00.3057661Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3057787Z test(self, **param_kwargs) 2023-01-11T22:51:00.3058141Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3058268Z return func(*args, **kwargs) 2023-01-11T22:51:00.3058507Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.3058621Z self.run_subtests( 2023-01-11T22:51:00.3059056Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3059229Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3059599Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3059754Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3060132Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3060253Z output = model(*input) 2023-01-11T22:51:00.3060563Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3060708Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3061089Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3061270Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3061639Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3061762Z _lazy_init(state, module) 2023-01-11T22:51:00.3062118Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3062289Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3062674Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3062817Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3063159Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3063288Z return func(*args, **kwargs) 2023-01-11T22:51:00.3063671Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3063775Z p_assert( 2023-01-11T22:51:00.3064113Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3064239Z traceback.print_stack() 2023-01-11T22:51:00.3064350Z File "", line 1, in 2023-01-11T22:51:00.3064562Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3064701Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3064904Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3065056Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3065269Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3065454Z self.run() 2023-01-11T22:51:00.3065657Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3065788Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3066133Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3066264Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3066627Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3066752Z getattr(self, test_name)() 2023-01-11T22:51:00.3067115Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3067214Z fn() 2023-01-11T22:51:00.3067582Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3067692Z test(self, **param_kwargs) 2023-01-11T22:51:00.3068049Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3068228Z return func(*args, **kwargs) 2023-01-11T22:51:00.3068485Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.3068602Z self.run_subtests( 2023-01-11T22:51:00.3068960Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3069123Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3069471Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3069624Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3070001Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3070127Z output = model(*input) 2023-01-11T22:51:00.3070457Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3070599Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3070977Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3071154Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3071523Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3071628Z _lazy_init(state, module) 2023-01-11T22:51:00.3071981Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3072147Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3072555Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3072698Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3073040Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3073166Z return func(*args, **kwargs) 2023-01-11T22:51:00.3073548Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3073633Z p_assert( 2023-01-11T22:51:00.3073973Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3074101Z traceback.print_stack() 2023-01-11T22:51:00.3074340Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3074578Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3074762Z File "", line 1, in 2023-01-11T22:51:00.3074977Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3075124Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3075308Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3075459Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3075669Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3075770Z self.run() 2023-01-11T22:51:00.3075971Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3076116Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3076460Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3076577Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3076946Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3077072Z getattr(self, test_name)() 2023-01-11T22:51:00.3077497Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3077603Z fn() 2023-01-11T22:51:00.3077974Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3078099Z test(self, **param_kwargs) 2023-01-11T22:51:00.3078452Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3078560Z return func(*args, **kwargs) 2023-01-11T22:51:00.3078812Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.3078928Z self.run_subtests( 2023-01-11T22:51:00.3079282Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3079450Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3079823Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3079976Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3080355Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3080457Z output = model(*input) 2023-01-11T22:51:00.3080783Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3080921Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3081299Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3081473Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3081846Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3081974Z _lazy_init(state, module) 2023-01-11T22:51:00.3082329Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3082481Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3082880Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3083024Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3083363Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3083486Z return func(*args, **kwargs) 2023-01-11T22:51:00.3083866Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3084059Z p_assert( 2023-01-11T22:51:00.3084400Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3084514Z traceback.print_stack() 2023-01-11T22:51:00.3084644Z File "", line 1, in 2023-01-11T22:51:00.3084857Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3085001Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3085204Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3085355Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3085569Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3085674Z self.run() 2023-01-11T22:51:00.3085858Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3086003Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3086350Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3086484Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3086894Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3087024Z getattr(self, test_name)() 2023-01-11T22:51:00.3087390Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3087470Z fn() 2023-01-11T22:51:00.3087837Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3087960Z test(self, **param_kwargs) 2023-01-11T22:51:00.3088317Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3088443Z return func(*args, **kwargs) 2023-01-11T22:51:00.3088696Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.3088811Z self.run_subtests( 2023-01-11T22:51:00.3089169Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3089313Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3089678Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3089829Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3090207Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3090329Z output = model(*input) 2023-01-11T22:51:00.3090658Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3090799Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3091177Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3091338Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3091709Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3091832Z _lazy_init(state, module) 2023-01-11T22:51:00.3092235Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3092407Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3092812Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3092958Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3093363Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3093490Z return func(*args, **kwargs) 2023-01-11T22:51:00.3093857Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3093960Z p_assert( 2023-01-11T22:51:00.3094298Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3094427Z traceback.print_stack() 2023-01-11T22:51:00.3094666Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3094902Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3095033Z File "", line 1, in 2023-01-11T22:51:00.3095227Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3095374Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3095581Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3095733Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3095995Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3096105Z self.run() 2023-01-11T22:51:00.3096308Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3096459Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3097020Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3097159Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3097289Z File "", line 1, in 2023-01-11T22:51:00.3097661Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3097785Z getattr(self, test_name)() 2023-01-11T22:51:00.3098156Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3098256Z fn() 2023-01-11T22:51:00.3098627Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3098733Z test(self, **param_kwargs) 2023-01-11T22:51:00.3098944Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3099086Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3099447Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3099572Z return func(*args, **kwargs) 2023-01-11T22:51:00.3099770Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3099922Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3100155Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.3100271Z self.run_subtests( 2023-01-11T22:51:00.3100484Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3100589Z self.run() 2023-01-11T22:51:00.3100946Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3101109Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3101316Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3101463Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3101811Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3101965Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3102303Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3102524Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3102913Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3103034Z output = model(*input) 2023-01-11T22:51:00.3103397Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3103522Z getattr(self, test_name)() 2023-01-11T22:51:00.3103831Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3103969Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3104327Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3104427Z fn() 2023-01-11T22:51:00.3104806Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3104985Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3105414Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3105547Z test(self, **param_kwargs) 2023-01-11T22:51:00.3105897Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3106022Z _lazy_init(state, module) 2023-01-11T22:51:00.3106383Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3106510Z return func(*args, **kwargs) 2023-01-11T22:51:00.3106870Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3107040Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3107294Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.3107410Z self.run_subtests( 2023-01-11T22:51:00.3107797Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3107939Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3108292Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3108456Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3108796Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3108920Z return func(*args, **kwargs) 2023-01-11T22:51:00.3109288Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3109444Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3109930Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3110021Z p_assert( 2023-01-11T22:51:00.3110439Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3110600Z output = model(*input) 2023-01-11T22:51:00.3110977Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3111157Z traceback.print_stack() 2023-01-11T22:51:00.3111520Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3111672Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3112273Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3112588Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3112948Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3113115Z _lazy_init(state, module) 2023-01-11T22:51:00.3113512Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3113729Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3114169Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3114354Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3114731Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3114945Z return func(*args, **kwargs) 2023-01-11T22:51:00.3115312Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3115456Z p_assert( 2023-01-11T22:51:00.3115886Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3116057Z traceback.print_stack() 2023-01-11T22:51:00.3116333Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3116607Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3117410Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3118185Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3118449Z File "", line 1, in 2023-01-11T22:51:00.3118718Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3118848Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3119089Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3119279Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3119532Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3119673Z self.run() 2023-01-11T22:51:00.3119916Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3120100Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3120428Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3120654Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3121065Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3121227Z getattr(self, test_name)() 2023-01-11T22:51:00.3121626Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3121760Z fn() 2023-01-11T22:51:00.3122163Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3122325Z test(self, **param_kwargs) 2023-01-11T22:51:00.3122660Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3122869Z return func(*args, **kwargs) 2023-01-11T22:51:00.3123193Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.3123404Z self.run_subtests( 2023-01-11T22:51:00.3123805Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3124006Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3124411Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3124601Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3124961Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3125130Z output = model(*input) 2023-01-11T22:51:00.3125493Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3125703Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3126131Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3126345Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3126798Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3126975Z _lazy_init(state, module) 2023-01-11T22:51:00.3127318Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3127528Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3128003Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3128181Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3128598Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3128766Z return func(*args, **kwargs) 2023-01-11T22:51:00.3129186Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3129344Z p_assert( 2023-01-11T22:51:00.3129668Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3129834Z traceback.print_stack() 2023-01-11T22:51:00.3130000Z File "", line 1, in 2023-01-11T22:51:00.3130250Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3130430Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3130707Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3130901Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3131162Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3131256Z self.run() 2023-01-11T22:51:00.3131492Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3131674Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3132092Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3132263Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3132661Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3132858Z getattr(self, test_name)() 2023-01-11T22:51:00.3133204Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3133352Z fn() 2023-01-11T22:51:00.3133760Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3133919Z test(self, **param_kwargs) 2023-01-11T22:51:00.3134374Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3134534Z return func(*args, **kwargs) 2023-01-11T22:51:00.3134823Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:51:00.3134974Z self.run_subtests( 2023-01-11T22:51:00.3135318Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3135567Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3135969Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3136159Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3136824Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3137040Z output = model(*input) 2023-01-11T22:51:00.3137420Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3137691Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3138075Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3138299Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3138753Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3138915Z _lazy_init(state, module) 2023-01-11T22:51:00.3139312Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3139520Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3139962Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3140158Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3140545Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3140655Z return func(*args, **kwargs) 2023-01-11T22:51:00.3141071Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3141247Z p_assert( 2023-01-11T22:51:00.3141624Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3141787Z traceback.print_stack() 2023-01-11T22:51:00.3142088Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3142359Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3142611Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3142834Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3143092Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3143348Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3143642Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3143907Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3144171Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3144444Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3144702Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3144992Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3145259Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3145528Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3145791Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3146087Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3146351Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3146642Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3146893Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3147099Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3147941Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3148727Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3149506Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3150306Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3151132Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3151923Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3153008Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:224: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:00.3153418Z local_num_valid_indices = torch.tensor([num_valid_indices], **tensor_kwargs) # type: ignore[arg-type, call-overload] 2023-01-11T22:51:00.3153690Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3153962Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3154235Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3154513Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3154874Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3155173Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3155439Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3155651Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3155917Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3156176Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3156434Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3156708Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3156977Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3157273Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3157582Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3157795Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3158061Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3158321Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3158590Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3158854Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3159113Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3159376Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3159714Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3159976Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3160727Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3161505Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3162309Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3163080Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3163862Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3164696Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3165484Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3166299Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3167209Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3168004Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3168784Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3169556Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3170336Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3171140Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3171927Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3172739Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3173519Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3174344Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3175118Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3175889Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3176983Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3177802Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3178079Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3178395Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3178664Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3178940Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3179089Z dist init r=0, world=2 2023-01-11T22:51:00.3179460Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.3179767Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.3180184Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.3180541Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.3180889Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.3181293Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.3181650Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.3182001Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.3182364Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.3182726Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.3183149Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.3183512Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.3183664Z dist init r=1, world=2 2023-01-11T22:51:00.3183962Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.3184345Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.3184699Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.3185053Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.3185484Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.3185850Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.3186198Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.3186545Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.3186903Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.3187296Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.3187659Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.3188015Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.3188102Z ok (35.961s) 2023-01-11T22:51:00.3188359Z test_delayed_reduce_scatter_offload_false_no_shard (__main__.TestParityWithDDP) 2023-01-11T22:51:00.3188711Z Tests the FSDP forward, backward, and optimizer step runtime by ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90174 2023-01-11T22:51:00.3188972Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 90175 2023-01-11T22:51:00.3189403Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.3189617Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.3190084Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.3190315Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.3190671Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.3190882Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.3191325Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.3191606Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.3191890Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.3192227Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.3192687Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.3193164Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.3193434Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.3193647Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.3193920Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3194190Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3195316Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.3195488Z warnings.warn( 2023-01-11T22:51:00.3196539Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.3196691Z warnings.warn( 2023-01-11T22:51:00.3197001Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3197302Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3197570Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3197835Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3198049Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3198320Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3198583Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3198846Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3199111Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3199404Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3199671Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3199936Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3200147Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3200420Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3200680Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3200940Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3201805Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3202605Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3203451Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3204302Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3205083Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3205866Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3206644Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3207427Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3208195Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3208975Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3209791Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3210569Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3211393Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3212166Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3212933Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3213748Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3214562Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3215345Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3216147Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3217180Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3217966Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3218743Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3219511Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3220301Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3221069Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3221939Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3222755Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3223592Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3224377Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3225163Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3225936Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3226747Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3227512Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3228294Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3229126Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3229904Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3230681Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3231516Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3232284Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3233100Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3233883Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3234660Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3235475Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3236254Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3236528Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3236750Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3237047Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3237314Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3237585Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3237850Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3238132Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3238427Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3238692Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3238905Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3239170Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3239428Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3239745Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3240009Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3240287Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3240549Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3240845Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3241112Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3241321Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3241615Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3241876Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3242144Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3242415Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3242719Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3243520Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3244327Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3245115Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3245877Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3246668Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3247446Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3248222Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3248987Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3249822Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3250624Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3251430Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3252256Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3253039Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3253804Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3254583Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3255347Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3256129Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3257193Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3257996Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3258768Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3259645Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3260418Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3261187Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3262017Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3262827Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3263623Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3264355Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3265125Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3265897Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3266667Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3267442Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3268225Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3268995Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3269819Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3270627Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3271457Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3272240Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3273021Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3273794Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3274593Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3275363Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3276125Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3276941Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3277705Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3278484Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3279308Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3280084Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3280889Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3281671Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3282432Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3283253Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3284022Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3284789Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3285553Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3286358Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3287117Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3287888Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3288716Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3289523Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3290337Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3291117Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3291879Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3292205Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3292499Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3292772Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3293042Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3293350Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3293564Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3293716Z dist init r=1, world=2 2023-01-11T22:51:00.3293889Z dist init r=0, world=2 2023-01-11T22:51:00.3294029Z ok (5.813s) 2023-01-11T22:51:00.3294291Z test_delayed_reduce_scatter_offload_false_none (__main__.TestParityWithDDP) 2023-01-11T22:51:00.3295269Z Tests the FSDP forward, backward, and optimizer step runtime by ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/82704 for platform(s) linux, rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.001s) 2023-01-11T22:51:00.3295534Z test_delayed_reduce_scatter_offload_false_shard_grad_op (__main__.TestParityWithDDP) 2023-01-11T22:51:00.3296463Z Tests the FSDP forward, backward, and optimizer step runtime by ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/82398 for platform(s) linux, rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.001s) 2023-01-11T22:51:00.3297088Z test_delayed_reduce_scatter_offload_true_no_shard (__main__.TestParityWithDDP) 2023-01-11T22:51:00.3297453Z Tests the FSDP forward, backward, and optimizer step runtime by ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90257 2023-01-11T22:51:00.3297807Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 90258 2023-01-11T22:51:00.3298189Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.3298418Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.3298837Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.3299067Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.3299476Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.3299690Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.3300143Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.3300413Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.3300722Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.3301015Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.3301457Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.3301893Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.3302159Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.3302422Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.3302696Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3303020Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3304106Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.3304259Z warnings.warn( 2023-01-11T22:51:00.3305308Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.3305457Z warnings.warn( 2023-01-11T22:51:00.3305576Z File "", line 1, in 2023-01-11T22:51:00.3305825Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3306005Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3306259Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3306519Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3306771Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3306910Z self.run() 2023-01-11T22:51:00.3307096Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3307278Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3307663Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3307890Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3308315Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3308478Z getattr(self, test_name)() 2023-01-11T22:51:00.3308911Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3309048Z fn() 2023-01-11T22:51:00.3309401Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3309561Z test(self, **param_kwargs) 2023-01-11T22:51:00.3309959Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3310122Z return func(*args, **kwargs) 2023-01-11T22:51:00.3310423Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:51:00.3310578Z self.run_subtests( 2023-01-11T22:51:00.3311050Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3311289Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3311644Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3311833Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3312255Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3312421Z output = model(*input) 2023-01-11T22:51:00.3312785Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3312962Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3313381Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3313595Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3313951Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3314143Z _lazy_init(state, module) 2023-01-11T22:51:00.3314536Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3314751Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3315194Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3315375Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3315754Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3315953Z return func(*args, **kwargs) 2023-01-11T22:51:00.3316319Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3316468Z p_assert( 2023-01-11T22:51:00.3316879Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3317054Z traceback.print_stack() 2023-01-11T22:51:00.3317220Z File "", line 1, in 2023-01-11T22:51:00.3317467Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3317645Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3317883Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3318019Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3318265Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3318462Z self.run() 2023-01-11T22:51:00.3318743Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3318924Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3319314Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3319487Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3319839Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3320000Z getattr(self, test_name)() 2023-01-11T22:51:00.3320425Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3320560Z fn() 2023-01-11T22:51:00.3320977Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3321171Z test(self, **param_kwargs) 2023-01-11T22:51:00.3321576Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3321737Z return func(*args, **kwargs) 2023-01-11T22:51:00.3322022Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:51:00.3322180Z self.run_subtests( 2023-01-11T22:51:00.3322574Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3322783Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3323184Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3323372Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3323823Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3323985Z output = model(*input) 2023-01-11T22:51:00.3324296Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3324475Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3324896Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3325155Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3325561Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3325720Z _lazy_init(state, module) 2023-01-11T22:51:00.3326113Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3326353Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3326742Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3326929Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3327306Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3327480Z return func(*args, **kwargs) 2023-01-11T22:51:00.3327898Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3328042Z p_assert( 2023-01-11T22:51:00.3328416Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3328576Z traceback.print_stack() 2023-01-11T22:51:00.3328798Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3329110Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3329275Z File "", line 1, in 2023-01-11T22:51:00.3329592Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3329811Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3330055Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3330245Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3330494Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3330580Z self.run() 2023-01-11T22:51:00.3330819Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3331036Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3331434Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3331604Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3332004Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3332168Z getattr(self, test_name)() 2023-01-11T22:51:00.3332620Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3332707Z fn() 2023-01-11T22:51:00.3333112Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3333285Z test(self, **param_kwargs) 2023-01-11T22:51:00.3333710Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3333874Z return func(*args, **kwargs) 2023-01-11T22:51:00.3334198Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:51:00.3334349Z self.run_subtests( 2023-01-11T22:51:00.3334740Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3334892Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3335294Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3335496Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3335910Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3336100Z output = model(*input) 2023-01-11T22:51:00.3336464Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3336889Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3337328Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3337491Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3337906Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3338075Z _lazy_init(state, module) 2023-01-11T22:51:00.3338469Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3338675Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3339156Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3339338Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3339747Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3339857Z return func(*args, **kwargs) 2023-01-11T22:51:00.3340277Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3340521Z p_assert( 2023-01-11T22:51:00.3340902Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3341064Z traceback.print_stack() 2023-01-11T22:51:00.3341237Z File "", line 1, in 2023-01-11T22:51:00.3341526Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3341708Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3341896Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3342084Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3342344Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3342487Z self.run() 2023-01-11T22:51:00.3342724Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3342907Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3343295Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3343414Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3343940Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3344122Z getattr(self, test_name)() 2023-01-11T22:51:00.3344529Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3344662Z fn() 2023-01-11T22:51:00.3345067Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3345233Z test(self, **param_kwargs) 2023-01-11T22:51:00.3345629Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3345737Z return func(*args, **kwargs) 2023-01-11T22:51:00.3346025Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:51:00.3346248Z self.run_subtests( 2023-01-11T22:51:00.3346651Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3346850Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3347250Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3347436Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3347851Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3347953Z output = model(*input) 2023-01-11T22:51:00.3348319Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3348503Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3348987Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3349203Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3349610Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3349768Z _lazy_init(state, module) 2023-01-11T22:51:00.3350159Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3350313Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3350753Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3350945Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3351322Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3351577Z return func(*args, **kwargs) 2023-01-11T22:51:00.3452891Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3453073Z p_assert( 2023-01-11T22:51:00.3453524Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3453643Z traceback.print_stack() 2023-01-11T22:51:00.3453871Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3454099Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3454212Z File "", line 1, in 2023-01-11T22:51:00.3454417Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3454547Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3454738Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3454886Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3455377Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3455488Z self.run() 2023-01-11T22:51:00.3455682Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3455810Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3456152Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3456275Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3456862Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3456985Z getattr(self, test_name)() 2023-01-11T22:51:00.3457346Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3457440Z fn() 2023-01-11T22:51:00.3457797Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3457907Z test(self, **param_kwargs) 2023-01-11T22:51:00.3458250Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3458364Z return func(*args, **kwargs) 2023-01-11T22:51:00.3458602Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:51:00.3458705Z self.run_subtests( 2023-01-11T22:51:00.3459042Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3459191Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3459542Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3459681Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3460049Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3460157Z output = model(*input) 2023-01-11T22:51:00.3460470Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3460597Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3460962Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3461124Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3461478Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3461581Z _lazy_init(state, module) 2023-01-11T22:51:00.3461924Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3462196Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3462589Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3462721Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3463046Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3463161Z return func(*args, **kwargs) 2023-01-11T22:51:00.3463526Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3463612Z p_assert( 2023-01-11T22:51:00.3463934Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3464049Z traceback.print_stack() 2023-01-11T22:51:00.3464169Z File "", line 1, in 2023-01-11T22:51:00.3464365Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3464495Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3464748Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3464899Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3465094Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3465186Z self.run() 2023-01-11T22:51:00.3465376Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3465510Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3465841Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3465962Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3466310Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3466422Z getattr(self, test_name)() 2023-01-11T22:51:00.3466833Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3466926Z fn() 2023-01-11T22:51:00.3467283Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3467394Z test(self, **param_kwargs) 2023-01-11T22:51:00.3467736Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3467849Z return func(*args, **kwargs) 2023-01-11T22:51:00.3468088Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:51:00.3468184Z self.run_subtests( 2023-01-11T22:51:00.3468521Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3468675Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3469029Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3469170Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3469532Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3469640Z output = model(*input) 2023-01-11T22:51:00.3469951Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3470070Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3470431Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3470594Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3471018Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3471129Z _lazy_init(state, module) 2023-01-11T22:51:00.3471472Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3471628Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3472014Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3472145Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3472463Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3472576Z return func(*args, **kwargs) 2023-01-11T22:51:00.3472940Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3473035Z p_assert( 2023-01-11T22:51:00.3473359Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3473521Z traceback.print_stack() 2023-01-11T22:51:00.3473754Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3473971Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3474089Z File "", line 1, in 2023-01-11T22:51:00.3474284Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3474414Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3474604Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3474742Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3474941Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3475037Z self.run() 2023-01-11T22:51:00.3475220Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3475354Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3475687Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3475808Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3476154Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3476265Z getattr(self, test_name)() 2023-01-11T22:51:00.3476609Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3476695Z fn() 2023-01-11T22:51:00.3477037Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3477151Z test(self, **param_kwargs) 2023-01-11T22:51:00.3477501Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3477613Z return func(*args, **kwargs) 2023-01-11T22:51:00.3477854Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:51:00.3477957Z self.run_subtests( 2023-01-11T22:51:00.3478295Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3478443Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3478788Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3478929Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3479288Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3479455Z output = model(*input) 2023-01-11T22:51:00.3479770Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3479900Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3480262Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3480424Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3480768Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3480876Z _lazy_init(state, module) 2023-01-11T22:51:00.3481213Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3481368Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3481751Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3481884Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3482260Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3482380Z return func(*args, **kwargs) 2023-01-11T22:51:00.3482742Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3482833Z p_assert( 2023-01-11T22:51:00.3483153Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3483269Z traceback.print_stack() 2023-01-11T22:51:00.3483386Z File "", line 1, in 2023-01-11T22:51:00.3483582Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3483713Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3483906Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3484040Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3484239Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3484332Z self.run() 2023-01-11T22:51:00.3484521Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3484654Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3484979Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3485100Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3485442Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3485554Z getattr(self, test_name)() 2023-01-11T22:51:00.3485896Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3485986Z fn() 2023-01-11T22:51:00.3486335Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3486450Z test(self, **param_kwargs) 2023-01-11T22:51:00.3486792Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3486904Z return func(*args, **kwargs) 2023-01-11T22:51:00.3487135Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:51:00.3487238Z self.run_subtests( 2023-01-11T22:51:00.3487575Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3487725Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3488074Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3488287Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3488657Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3488766Z output = model(*input) 2023-01-11T22:51:00.3489071Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3489197Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3489556Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3489717Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3490068Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3490177Z _lazy_init(state, module) 2023-01-11T22:51:00.3490518Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3490677Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3491133Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3491272Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3491602Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3491718Z return func(*args, **kwargs) 2023-01-11T22:51:00.3492083Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3492227Z p_assert( 2023-01-11T22:51:00.3492560Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3492676Z traceback.print_stack() 2023-01-11T22:51:00.3492898Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3493124Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3493246Z File "", line 1, in 2023-01-11T22:51:00.3493444Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3493574Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3493762Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3493900Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3494100Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3494187Z self.run() 2023-01-11T22:51:00.3494377Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3494510Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3494844Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3494966Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3495318Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3495432Z getattr(self, test_name)() 2023-01-11T22:51:00.3495777Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3495856Z fn() 2023-01-11T22:51:00.3496204Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3496318Z test(self, **param_kwargs) 2023-01-11T22:51:00.3496903Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3497024Z return func(*args, **kwargs) 2023-01-11T22:51:00.3497356Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:51:00.3497460Z self.run_subtests( 2023-01-11T22:51:00.3497806Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3497958Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3498311Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3498451Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3498812Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3498919Z output = model(*input) 2023-01-11T22:51:00.3499230Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3499356Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3499720Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3499942Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3500306Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3500416Z _lazy_init(state, module) 2023-01-11T22:51:00.3500755Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3500912Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3501295Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3501426Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3501751Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3501863Z return func(*args, **kwargs) 2023-01-11T22:51:00.3502226Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3502316Z p_assert( 2023-01-11T22:51:00.3502638Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3502752Z traceback.print_stack() 2023-01-11T22:51:00.3502871Z File "", line 1, in 2023-01-11T22:51:00.3503068Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3503192Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3503381Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3503519Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3503717Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3503812Z self.run() 2023-01-11T22:51:00.3504002Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3504138Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3504467Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3504582Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3504927Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3505038Z getattr(self, test_name)() 2023-01-11T22:51:00.3505384Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3505471Z fn() 2023-01-11T22:51:00.3505820Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3505991Z test(self, **param_kwargs) 2023-01-11T22:51:00.3506339Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3506449Z return func(*args, **kwargs) 2023-01-11T22:51:00.3506688Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:51:00.3506790Z self.run_subtests( 2023-01-11T22:51:00.3507132Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3507281Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3507628Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3507771Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3508136Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3508241Z output = model(*input) 2023-01-11T22:51:00.3508556Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3508772Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3509146Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3509307Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3509659Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3509769Z _lazy_init(state, module) 2023-01-11T22:51:00.3510108Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3510257Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3510641Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3510777Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3511107Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3511220Z return func(*args, **kwargs) 2023-01-11T22:51:00.3511584Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3511675Z p_assert( 2023-01-11T22:51:00.3511999Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3512107Z traceback.print_stack() 2023-01-11T22:51:00.3512331Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3512554Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3512677Z File "", line 1, in 2023-01-11T22:51:00.3512873Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3513004Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3513195Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3513335Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3513528Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3513619Z self.run() 2023-01-11T22:51:00.3513807Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3513943Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3514272Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3514395Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3514740Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3514911Z getattr(self, test_name)() 2023-01-11T22:51:00.3515255Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3515344Z fn() 2023-01-11T22:51:00.3515696Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3515811Z test(self, **param_kwargs) 2023-01-11T22:51:00.3516158Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3516272Z return func(*args, **kwargs) 2023-01-11T22:51:00.3516512Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:51:00.3516608Z self.run_subtests( 2023-01-11T22:51:00.3516946Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3517099Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3517498Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3517644Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3518007Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3518115Z output = model(*input) 2023-01-11T22:51:00.3518426Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3518545Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3518907Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3519069Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3519427Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3519537Z _lazy_init(state, module) 2023-01-11T22:51:00.3519879Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3520036Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3520418Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3520550Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3520871Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3520985Z return func(*args, **kwargs) 2023-01-11T22:51:00.3521350Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3521445Z p_assert( 2023-01-11T22:51:00.3521768Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3521886Z traceback.print_stack() 2023-01-11T22:51:00.3522004Z File "", line 1, in 2023-01-11T22:51:00.3522193Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3522324Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3522514Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3522651Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3522849Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3522942Z self.run() 2023-01-11T22:51:00.3523133Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3523268Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3523648Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3523770Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3524123Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3524237Z getattr(self, test_name)() 2023-01-11T22:51:00.3524584Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3524670Z fn() 2023-01-11T22:51:00.3525021Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3525145Z test(self, **param_kwargs) 2023-01-11T22:51:00.3525482Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3525604Z return func(*args, **kwargs) 2023-01-11T22:51:00.3525856Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:51:00.3525967Z self.run_subtests( 2023-01-11T22:51:00.3526375Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3526543Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3526906Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3527058Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3527412Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3527530Z output = model(*input) 2023-01-11T22:51:00.3527856Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3527998Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3528370Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3528545Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3528907Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3529027Z _lazy_init(state, module) 2023-01-11T22:51:00.3529358Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3529523Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3529916Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3530056Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3530388Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3530515Z return func(*args, **kwargs) 2023-01-11T22:51:00.3530888Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3530992Z p_assert( 2023-01-11T22:51:00.3531309Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3531432Z traceback.print_stack() 2023-01-11T22:51:00.3531665Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3531899Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3532026Z File "", line 1, in 2023-01-11T22:51:00.3532232Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3532373Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3532629Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3532763Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3532976Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3533079Z self.run() 2023-01-11T22:51:00.3533278Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3533422Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3533764Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3533895Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3534234Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3534357Z getattr(self, test_name)() 2023-01-11T22:51:00.3534710Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3534811Z fn() 2023-01-11T22:51:00.3535171Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3535339Z test(self, **param_kwargs) 2023-01-11T22:51:00.3535702Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3535828Z return func(*args, **kwargs) 2023-01-11T22:51:00.3536062Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:51:00.3536175Z self.run_subtests( 2023-01-11T22:51:00.3536523Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3536928Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3537298Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3537454Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3537833Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3537953Z output = model(*input) 2023-01-11T22:51:00.3538261Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3538397Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3538768Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3538940Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3539303Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3539423Z _lazy_init(state, module) 2023-01-11T22:51:00.3539777Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3539944Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3540341Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3540466Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3540805Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3540929Z return func(*args, **kwargs) 2023-01-11T22:51:00.3541304Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3541405Z p_assert( 2023-01-11T22:51:00.3541739Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3541864Z traceback.print_stack() 2023-01-11T22:51:00.3542065Z File "", line 1, in 2023-01-11T22:51:00.3542273Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3542421Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3542623Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3542777Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3542986Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3543090Z self.run() 2023-01-11T22:51:00.3543289Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3543417Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3543761Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3543892Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3544254Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3544377Z getattr(self, test_name)() 2023-01-11T22:51:00.3544796Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3544904Z fn() 2023-01-11T22:51:00.3545273Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3545379Z test(self, **param_kwargs) 2023-01-11T22:51:00.3545732Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3545857Z return func(*args, **kwargs) 2023-01-11T22:51:00.3546107Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:51:00.3546219Z self.run_subtests( 2023-01-11T22:51:00.3546568Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3546733Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3547094Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3547231Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3547599Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3547716Z output = model(*input) 2023-01-11T22:51:00.3548038Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3548173Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3548544Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3548715Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3549081Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3549187Z _lazy_init(state, module) 2023-01-11T22:51:00.3549537Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3549702Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3550094Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3550233Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3550567Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3550690Z return func(*args, **kwargs) 2023-01-11T22:51:00.3551062Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3551201Z p_assert( 2023-01-11T22:51:00.3551537Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3551667Z traceback.print_stack() 2023-01-11T22:51:00.3551904Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3552136Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3552265Z File "", line 1, in 2023-01-11T22:51:00.3552473Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3552614Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3552795Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3552945Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3553152Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3553258Z self.run() 2023-01-11T22:51:00.3553458Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3553666Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3554020Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3554135Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3554493Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3554616Z getattr(self, test_name)() 2023-01-11T22:51:00.3554971Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3555070Z fn() 2023-01-11T22:51:00.3555428Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3555556Z test(self, **param_kwargs) 2023-01-11T22:51:00.3555909Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3556018Z return func(*args, **kwargs) 2023-01-11T22:51:00.3556268Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:51:00.3556381Z self.run_subtests( 2023-01-11T22:51:00.3556727Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3556888Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3557247Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3557398Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3557770Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3557875Z output = model(*input) 2023-01-11T22:51:00.3558197Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3558337Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3558709Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3558882Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3559242Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3559361Z _lazy_init(state, module) 2023-01-11T22:51:00.3559708Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3559858Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3560342Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3560485Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3560826Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3560949Z return func(*args, **kwargs) 2023-01-11T22:51:00.3561325Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3561427Z p_assert( 2023-01-11T22:51:00.3561762Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3561871Z traceback.print_stack() 2023-01-11T22:51:00.3561999Z File "", line 1, in 2023-01-11T22:51:00.3562204Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3562344Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3562548Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3562696Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3562951Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3563062Z self.run() 2023-01-11T22:51:00.3563245Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3563391Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3563730Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3563864Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3564223Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3564346Z getattr(self, test_name)() 2023-01-11T22:51:00.3564703Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3564807Z fn() 2023-01-11T22:51:00.3565153Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3565276Z test(self, **param_kwargs) 2023-01-11T22:51:00.3565630Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3565753Z return func(*args, **kwargs) 2023-01-11T22:51:00.3566000Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:51:00.3566113Z self.run_subtests( 2023-01-11T22:51:00.3566461Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3566605Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3566961Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3567115Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3567489Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3567607Z output = model(*input) 2023-01-11T22:51:00.3567931Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3568066Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3568441Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3568613Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3568962Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3569082Z _lazy_init(state, module) 2023-01-11T22:51:00.3569492Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3569661Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3570058Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3570200Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3570539Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3570663Z return func(*args, **kwargs) 2023-01-11T22:51:00.3571023Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3571124Z p_assert( 2023-01-11T22:51:00.3571457Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3571585Z traceback.print_stack() 2023-01-11T22:51:00.3571819Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3572097Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3572858Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3573588Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3574331Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3575070Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3575803Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3576789Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3577562Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3578294Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3578428Z File "", line 1, in 2023-01-11T22:51:00.3578709Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3578851Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3579057Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3579207Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3579417Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3579521Z self.run() 2023-01-11T22:51:00.3579721Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3579849Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3580195Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3580330Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3580690Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3580818Z getattr(self, test_name)() 2023-01-11T22:51:00.3581178Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3581337Z fn() 2023-01-11T22:51:00.3581713Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3581819Z test(self, **param_kwargs) 2023-01-11T22:51:00.3582173Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3582297Z return func(*args, **kwargs) 2023-01-11T22:51:00.3582548Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:51:00.3582664Z self.run_subtests( 2023-01-11T22:51:00.3583013Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3583177Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3583542Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3583678Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3584051Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3584174Z output = model(*input) 2023-01-11T22:51:00.3584496Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3584631Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3585001Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3585174Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3585537Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3585659Z _lazy_init(state, module) 2023-01-11T22:51:00.3585998Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3586164Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3586557Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3586699Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3587032Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3587157Z return func(*args, **kwargs) 2023-01-11T22:51:00.3587530Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3587690Z p_assert( 2023-01-11T22:51:00.3588011Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3588137Z traceback.print_stack() 2023-01-11T22:51:00.3588270Z File "", line 1, in 2023-01-11T22:51:00.3588480Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3588622Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3588825Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3588976Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3589169Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3589273Z self.run() 2023-01-11T22:51:00.3589472Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3589615Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3589951Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3590087Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3590493Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3590623Z getattr(self, test_name)() 2023-01-11T22:51:00.3590967Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3591065Z fn() 2023-01-11T22:51:00.3591425Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3591548Z test(self, **param_kwargs) 2023-01-11T22:51:00.3591901Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3592023Z return func(*args, **kwargs) 2023-01-11T22:51:00.3592322Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:51:00.3592444Z self.run_subtests( 2023-01-11T22:51:00.3592781Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3592943Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3593303Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3593453Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3593826Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3593944Z output = model(*input) 2023-01-11T22:51:00.3594267Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3594402Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3594763Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3594936Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3595302Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3595421Z _lazy_init(state, module) 2023-01-11T22:51:00.3595773Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3595939Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3596333Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3596474Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3596792Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3596976Z return func(*args, **kwargs) 2023-01-11T22:51:00.3597356Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3597463Z p_assert( 2023-01-11T22:51:00.3597798Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3597922Z traceback.print_stack() 2023-01-11T22:51:00.3598155Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3598388Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3598501Z File "", line 1, in 2023-01-11T22:51:00.3598707Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3598847Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3599049Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3599202Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3599410Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3599558Z self.run() 2023-01-11T22:51:00.3599749Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3599896Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3600238Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3600374Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3600734Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3600859Z getattr(self, test_name)() 2023-01-11T22:51:00.3601218Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3601321Z fn() 2023-01-11T22:51:00.3601665Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3601788Z test(self, **param_kwargs) 2023-01-11T22:51:00.3602145Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3602270Z return func(*args, **kwargs) 2023-01-11T22:51:00.3602518Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:51:00.3602631Z self.run_subtests( 2023-01-11T22:51:00.3602979Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3603141Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3603481Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3603635Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3604004Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3604126Z output = model(*input) 2023-01-11T22:51:00.3604450Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3604586Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3604959Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3605132Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3605477Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3605597Z _lazy_init(state, module) 2023-01-11T22:51:00.3605946Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3606170Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3606574Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3606718Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3607055Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3607179Z return func(*args, **kwargs) 2023-01-11T22:51:00.3607551Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3607637Z p_assert( 2023-01-11T22:51:00.3607969Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3608094Z traceback.print_stack() 2023-01-11T22:51:00.3608224Z File "", line 1, in 2023-01-11T22:51:00.3608435Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3608576Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3608823Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3608963Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3609173Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3609278Z self.run() 2023-01-11T22:51:00.3609478Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3609624Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3609964Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3610094Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3610452Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3610562Z getattr(self, test_name)() 2023-01-11T22:51:00.3610920Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3611018Z fn() 2023-01-11T22:51:00.3611380Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3611502Z test(self, **param_kwargs) 2023-01-11T22:51:00.3611857Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3611981Z return func(*args, **kwargs) 2023-01-11T22:51:00.3612229Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:51:00.3612324Z self.run_subtests( 2023-01-11T22:51:00.3612673Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3612838Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3613200Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3613352Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3613724Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3613843Z output = model(*input) 2023-01-11T22:51:00.3614164Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3614284Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3614655Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3614826Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3615249Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3615370Z _lazy_init(state, module) 2023-01-11T22:51:00.3615725Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3615893Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3616286Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3616410Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3616986Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3617118Z return func(*args, **kwargs) 2023-01-11T22:51:00.3617502Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3617611Z p_assert( 2023-01-11T22:51:00.3617943Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3618068Z traceback.print_stack() 2023-01-11T22:51:00.3618393Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3618622Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3618754Z File "", line 1, in 2023-01-11T22:51:00.3618961Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3619101Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3619299Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3619449Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3619658Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3619749Z self.run() 2023-01-11T22:51:00.3619948Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3620092Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3620439Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3620573Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3620933Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3621057Z getattr(self, test_name)() 2023-01-11T22:51:00.3621412Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3621493Z fn() 2023-01-11T22:51:00.3621853Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3621976Z test(self, **param_kwargs) 2023-01-11T22:51:00.3622333Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3622456Z return func(*args, **kwargs) 2023-01-11T22:51:00.3622708Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:51:00.3622822Z self.run_subtests( 2023-01-11T22:51:00.3623169Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3623313Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3623673Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3623824Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3624199Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3624393Z output = model(*input) 2023-01-11T22:51:00.3624725Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3624867Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3625240Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3625396Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3625757Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3625875Z _lazy_init(state, module) 2023-01-11T22:51:00.3626224Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3626389Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3626781Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3626926Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3627310Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3627425Z return func(*args, **kwargs) 2023-01-11T22:51:00.3627801Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3627904Z p_assert( 2023-01-11T22:51:00.3628237Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3628361Z traceback.print_stack() 2023-01-11T22:51:00.3628493Z File "", line 1, in 2023-01-11T22:51:00.3628701Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3628844Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3629028Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3629181Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3629392Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3629494Z self.run() 2023-01-11T22:51:00.3629693Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3629838Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3630174Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3630305Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3630645Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3630768Z getattr(self, test_name)() 2023-01-11T22:51:00.3631123Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3631222Z fn() 2023-01-11T22:51:00.3631580Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3631704Z test(self, **param_kwargs) 2023-01-11T22:51:00.3632055Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3632162Z return func(*args, **kwargs) 2023-01-11T22:51:00.3632409Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:51:00.3632522Z self.run_subtests( 2023-01-11T22:51:00.3632871Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3633031Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3633386Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3633592Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3633971Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3634091Z output = model(*input) 2023-01-11T22:51:00.3634397Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3634535Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3634910Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3635082Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3635447Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3635566Z _lazy_init(state, module) 2023-01-11T22:51:00.3635913Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3636083Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3636503Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3636655Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3636995Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3637120Z return func(*args, **kwargs) 2023-01-11T22:51:00.3637492Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3637593Z p_assert( 2023-01-11T22:51:00.3637927Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3638051Z traceback.print_stack() 2023-01-11T22:51:00.3638275Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3638507Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3639258Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3639999Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3640131Z File "", line 1, in 2023-01-11T22:51:00.3640338Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3640482Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3640680Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3640830Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3641024Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3641128Z self.run() 2023-01-11T22:51:00.3641327Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3641474Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3641814Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3641946Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3642304Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3642483Z getattr(self, test_name)() 2023-01-11T22:51:00.3642823Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3642920Z fn() 2023-01-11T22:51:00.3643286Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3643409Z test(self, **param_kwargs) 2023-01-11T22:51:00.3643761Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3643887Z return func(*args, **kwargs) 2023-01-11T22:51:00.3644137Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:51:00.3644249Z self.run_subtests( 2023-01-11T22:51:00.3644578Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3644739Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3645103Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3645300Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3645683Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3645804Z output = model(*input) 2023-01-11T22:51:00.3646130Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3646267Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3646622Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3646795Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3647156Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3647280Z _lazy_init(state, module) 2023-01-11T22:51:00.3647633Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3647798Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3648192Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3648332Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3648650Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3648774Z return func(*args, **kwargs) 2023-01-11T22:51:00.3649147Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3649246Z p_assert( 2023-01-11T22:51:00.3649577Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3649707Z traceback.print_stack() 2023-01-11T22:51:00.3649834Z File "", line 1, in 2023-01-11T22:51:00.3650043Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.3650169Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.3650368Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.3650515Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.3650724Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.3650827Z self.run() 2023-01-11T22:51:00.3651027Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.3651171Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.3651492Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.3651718Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.3652084Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.3652207Z getattr(self, test_name)() 2023-01-11T22:51:00.3652564Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.3652661Z fn() 2023-01-11T22:51:00.3653024Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.3653146Z test(self, **param_kwargs) 2023-01-11T22:51:00.3653479Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.3653602Z return func(*args, **kwargs) 2023-01-11T22:51:00.3653851Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:51:00.3653967Z self.run_subtests( 2023-01-11T22:51:00.3654316Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.3654521Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.3654895Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.3655049Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.3655403Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.3655523Z output = model(*input) 2023-01-11T22:51:00.3655845Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.3655983Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.3656355Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.3656748Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.3657153Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.3657274Z _lazy_init(state, module) 2023-01-11T22:51:00.3657609Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.3657775Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.3658169Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.3658310Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.3658644Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.3658773Z return func(*args, **kwargs) 2023-01-11T22:51:00.3659147Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.3659248Z p_assert( 2023-01-11T22:51:00.3659571Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.3659695Z traceback.print_stack() 2023-01-11T22:51:00.3659933Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3660164Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3660393Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3660623Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3660851Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3661167Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3661394Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3661606Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3661830Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3662053Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3662276Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3662498Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3662717Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3662939Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3663165Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3663427Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3663661Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3663886Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3664107Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3664330Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3665369Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:259: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:00.3665483Z world_indices[ 2023-01-11T22:51:00.3666492Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:259: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:00.3666598Z world_indices[ 2023-01-11T22:51:00.3666825Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3667049Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3667258Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3667484Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3667710Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3667931Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3668154Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3668374Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3668595Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3668816Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3669019Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3669298Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3669520Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3669746Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3669965Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3670187Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3670407Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3670627Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3670831Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3671051Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3671274Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3671543Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3671770Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3671993Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3672739Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3673475Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3674216Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3674951Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3675679Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3676412Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3677140Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3677866Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3678650Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3679377Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3680099Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3680882Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3681613Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3682333Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3683060Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3683776Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3684495Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3685220Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3685941Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3686657Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3687432Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3688149Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3688869Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3689635Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3690362Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3691080Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3691808Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3692583Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3693313Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3694038Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3694759Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3695478Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3696266Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3697214Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3697946Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3698725Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3699460Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3700181Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3700910Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3701629Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3702345Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3703068Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3703784Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3704502Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3705289Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3706011Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3706730Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3707500Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3708227Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3708946Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3709673Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3710393Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3711108Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3711834Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3712551Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3713270Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3714043Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3714765Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3715482Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3716245Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3716976Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3717694Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3717933Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3718171Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3718398Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3718625Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.3718737Z dist init r=1, world=2 2023-01-11T22:51:00.3719065Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.3719364Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.3719669Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.3719977Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.3720278Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.3720576Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.3720873Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.3721170Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.3721524Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.3721824Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.3722122Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.3722419Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.3722532Z dist init r=0, world=2 2023-01-11T22:51:00.3722835Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.3723192Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.3723505Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.3723805Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.3724105Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.3724404Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.3724706Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.3725004Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.3725308Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.3725608Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.3725907Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.3726205Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.3726293Z ok (6.214s) 2023-01-11T22:51:00.3726502Z test_delayed_reduce_scatter_offload_true_none (__main__.TestParityWithDDP) 2023-01-11T22:51:00.3727411Z Tests the FSDP forward, backward, and optimizer step runtime by ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/82399 for platform(s) linux, rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.001s) 2023-01-11T22:51:00.3727632Z test_delayed_reduce_scatter_offload_true_shard_grad_op (__main__.TestParityWithDDP) 2023-01-11T22:51:00.3728508Z Tests the FSDP forward, backward, and optimizer step runtime by ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/82403 for platform(s) linux, rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.001s) 2023-01-11T22:51:00.3728889Z test_mixture_of_experts_offload_false_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90340 2023-01-11T22:51:00.3729107Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 90341 2023-01-11T22:51:00.3729478Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.3729655Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.3730034Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.3730223Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.3730569Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.3730744Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.3731180Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.3731376Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.3731624Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.3731867Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.3732266Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.3732657Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.3732889Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.3733098Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.3734127Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.3734241Z warnings.warn( 2023-01-11T22:51:00.3735243Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.3735360Z warnings.warn( 2023-01-11T22:51:00.3735601Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:51:00.3735843Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:51:00.3736234Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:51:00.3737212Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3737710Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:51:00.3738455Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3738696Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:51:00.3738938Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:51:00.3739310Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:51:00.3739698Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:51:00.3739940Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:51:00.3740258Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:51:00.3740661Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:51:00.3741045Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:51:00.3741280Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2023-01-11T22:51:00.3741517Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2023-01-11T22:51:00.3741902Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:51:00.3742274Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:51:00.3742511Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2023-01-11T22:51:00.3742746Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2023-01-11T22:51:00.3743129Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:51:00.3743509Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:51:00.3743746Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2023-01-11T22:51:00.3743980Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2023-01-11T22:51:00.3744360Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:51:00.3744741Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:51:00.3744976Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2023-01-11T22:51:00.3745193Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2023-01-11T22:51:00.3745575Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:51:00.3745955Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:51:00.3746186Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2023-01-11T22:51:00.3746419Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2023-01-11T22:51:00.3746859Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:51:00.3747243Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:51:00.3747986Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3748721Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3749489Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3750229Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3750961Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3751701Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3752423Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3753152Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3753886Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3754608Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3755330Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3756114Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3756838Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3757562Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3758329Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3759059Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3759781Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3760511Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3761233Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3761956Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3762685Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3763391Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3764113Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3764890Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3765613Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3766331Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3767165Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3767912Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3768635Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3769363Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3770087Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3770805Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3771527Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3772248Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3772968Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3773687Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3774461Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3775184Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3775955Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3776902Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3777162Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2023-01-11T22:51:00.3777402Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2023-01-11T22:51:00.3777806Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:51:00.3778206Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:51:00.3778449Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2023-01-11T22:51:00.3778689Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2023-01-11T22:51:00.3779082Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:51:00.3779467Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:51:00.3779689Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2023-01-11T22:51:00.3779921Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2023-01-11T22:51:00.3780313Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:51:00.3780703Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:51:00.3780941Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2023-01-11T22:51:00.3781174Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2023-01-11T22:51:00.3781564Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:51:00.3781954Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:51:00.3782189Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2023-01-11T22:51:00.3782401Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2023-01-11T22:51:00.3782884Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:51:00.3783275Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:51:00.3783513Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2023-01-11T22:51:00.3783744Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2023-01-11T22:51:00.3784134Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:51:00.3784518Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:51:00.3784751Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2023-01-11T22:51:00.3784985Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2023-01-11T22:51:00.3785447Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:51:00.3785846Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:51:00.3786081Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2023-01-11T22:51:00.3786310Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2023-01-11T22:51:00.3786697Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:51:00.3787084Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:51:00.3787326Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2023-01-11T22:51:00.3787558Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2023-01-11T22:51:00.3787944Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:51:00.3788330Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:51:00.3788547Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2023-01-11T22:51:00.3788777Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2023-01-11T22:51:00.3789163Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:51:00.3789552Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:51:00.3789788Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 1 2023-01-11T22:51:00.3790016Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 0 2023-01-11T22:51:00.3790400Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:51:00.3790786Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:51:00.3791016Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 1 2023-01-11T22:51:00.3791229Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 0 2023-01-11T22:51:00.3791615Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:51:00.3792062Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:51:00.3792851Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3793588Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3794354Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3795103Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3795834Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3796563Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3797293Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3798019Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3798745Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3799469Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3800190Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3800910Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3801690Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3802408Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3803184Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3803919Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3804638Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3805359Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3806083Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3806803Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3807523Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3808250Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3808968Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3809688Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3810459Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3811182Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3811901Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3812673Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3813403Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3814123Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3814850Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3815571Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3816291Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3817248Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3817974Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3818697Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3819505Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3820234Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3820955Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3821728Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3822459Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3823177Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3823907Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3824621Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3825343Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3826067Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3826784Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3827504Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3828283Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3829003Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3829722Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3830485Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3831211Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3831929Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3832656Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3833373Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3834095Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3834816Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3835534Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3836251Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3837025Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3837743Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3838463Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3839221Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3839948Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3840193Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 0 2023-01-11T22:51:00.3840417Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 1 2023-01-11T22:51:00.3840815Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:51:00.3841208Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:51:00.3841928Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3842169Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 1 2023-01-11T22:51:00.3842403Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 0 2023-01-11T22:51:00.3842795Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:51:00.3843189Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:51:00.3843429Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 1 2023-01-11T22:51:00.3843663Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 0 2023-01-11T22:51:00.3844040Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:51:00.3844426Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:51:00.3844660Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 1 2023-01-11T22:51:00.3844944Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 0 2023-01-11T22:51:00.3845340Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:51:00.3845726Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:51:00.3845838Z dist init r=1, world=2 2023-01-11T22:51:00.3845950Z dist init r=0, world=2 2023-01-11T22:51:00.3846033Z ok (6.716s) 2023-01-11T22:51:00.3846354Z test_mixture_of_experts_offload_false_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90663 2023-01-11T22:51:00.3846574Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 90664 2023-01-11T22:51:00.3846942Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.3847118Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.3847491Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.3847728Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.3848104Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.3848278Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.3848639Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.3848828Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.3849070Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.3849310Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.3849706Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.3850099Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.3850326Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.3850551Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.3851563Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.3851680Z warnings.warn( 2023-01-11T22:51:00.3852690Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.3852784Z warnings.warn( 2023-01-11T22:51:00.3853028Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:51:00.3853271Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:51:00.3853666Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:51:00.3854264Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T22:51:00.3854375Z warnings.warn( 2023-01-11T22:51:00.3855113Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3855506Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:51:00.3856031Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T22:51:00.3856145Z warnings.warn( 2023-01-11T22:51:00.3857169Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3857414Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:51:00.3857654Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:51:00.3858057Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:51:00.3858445Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:51:00.3858683Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:51:00.3858925Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:51:00.3859316Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:51:00.3859700Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:51:00.3859936Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2023-01-11T22:51:00.3860153Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2023-01-11T22:51:00.3860534Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:51:00.3860916Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:51:00.3861155Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2023-01-11T22:51:00.3861391Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2023-01-11T22:51:00.3861773Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:51:00.3862152Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:51:00.3862387Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2023-01-11T22:51:00.3862620Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2023-01-11T22:51:00.3863003Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:51:00.3863444Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:51:00.3863684Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2023-01-11T22:51:00.3863919Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2023-01-11T22:51:00.3864299Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:51:00.3864679Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:51:00.3864916Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2023-01-11T22:51:00.3865152Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2023-01-11T22:51:00.3865532Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:51:00.3865920Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:51:00.3866720Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3867470Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3868190Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3868928Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3869659Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3870375Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3871107Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3871831Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3872558Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3873341Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3874071Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3874796Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3875567Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3876302Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3877022Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3877751Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3878476Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3879199Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3879925Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3880646Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3881365Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3882142Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3882863Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3883588Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3883878Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2023-01-11T22:51:00.3884122Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2023-01-11T22:51:00.3884521Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:51:00.3884916Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:51:00.3885159Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2023-01-11T22:51:00.3885395Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2023-01-11T22:51:00.3885792Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:51:00.3886169Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:51:00.3886408Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2023-01-11T22:51:00.3886637Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2023-01-11T22:51:00.3887025Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:51:00.3887411Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:51:00.3887647Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2023-01-11T22:51:00.3887880Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2023-01-11T22:51:00.3888270Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:51:00.3888658Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:51:00.3888892Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2023-01-11T22:51:00.3889105Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2023-01-11T22:51:00.3889493Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:51:00.3889880Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:51:00.3890168Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2023-01-11T22:51:00.3890397Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2023-01-11T22:51:00.3890789Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:51:00.3891177Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:51:00.3891412Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2023-01-11T22:51:00.3891642Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2023-01-11T22:51:00.3892009Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:51:00.3892444Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:51:00.3892685Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2023-01-11T22:51:00.3892965Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2023-01-11T22:51:00.3893365Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:51:00.3893750Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:51:00.3893983Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2023-01-11T22:51:00.3894210Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2023-01-11T22:51:00.3894596Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:51:00.3894968Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:51:00.3895205Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2023-01-11T22:51:00.3895435Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2023-01-11T22:51:00.3895819Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:51:00.3896203Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:51:00.3896434Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 1 2023-01-11T22:51:00.3896877Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 0 2023-01-11T22:51:00.3897287Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:51:00.3897676Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:51:00.3897911Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 1 2023-01-11T22:51:00.3898123Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 0 2023-01-11T22:51:00.3898509Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:51:00.3898893Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:51:00.3899630Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3900453Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3901173Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3901903Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3902686Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3903424Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3904148Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3904877Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3905604Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3906327Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3907055Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3907778Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3908489Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3909286Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3910008Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3910730Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3911492Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3912222Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3912944Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3913669Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3914384Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3915104Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3915826Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3916544Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3917261Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3918039Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3918756Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3919476Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3920237Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3920966Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3921685Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3922412Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3923131Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3923851Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3924572Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3925289Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3926003Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3926823Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3927543Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3928261Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3929034Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3929288Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 1 2023-01-11T22:51:00.3929528Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 0 2023-01-11T22:51:00.3929912Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:51:00.3930306Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:51:00.3931039Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3931283Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 1 2023-01-11T22:51:00.3931516Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 0 2023-01-11T22:51:00.3931904Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:51:00.3932291Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:51:00.3932531Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 1 2023-01-11T22:51:00.3932769Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 0 2023-01-11T22:51:00.3933161Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:51:00.3933551Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:51:00.3933770Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 0 2023-01-11T22:51:00.3934002Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 1 2023-01-11T22:51:00.3934390Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:51:00.3934778Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:51:00.3934943Z dist init r=1, world=2 2023-01-11T22:51:00.3935053Z dist init r=0, world=2 2023-01-11T22:51:00.3935154Z ok (7.016s) 2023-01-11T22:51:00.3935491Z test_mixture_of_experts_offload_false_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90986 2023-01-11T22:51:00.3935695Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 90987 2023-01-11T22:51:00.3936069Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.3936242Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.3936827Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.3937015Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.3937401Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.3937598Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.3938042Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.3938243Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.3938466Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.3938708Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.3939108Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.3939499Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.3939725Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.3939956Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.3940973Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.3941087Z warnings.warn( 2023-01-11T22:51:00.3942092Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.3942211Z warnings.warn( 2023-01-11T22:51:00.3942452Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:51:00.3942673Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:51:00.3943063Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:51:00.3943601Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2023-01-11T22:51:00.3943711Z warnings.warn( 2023-01-11T22:51:00.3944452Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3944921Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:51:00.3945452Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2023-01-11T22:51:00.3945563Z warnings.warn( 2023-01-11T22:51:00.3946296Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3946542Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:51:00.3946825Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:51:00.3947208Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:51:00.3947598Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:51:00.3947838Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:51:00.3948076Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:51:00.3948459Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:51:00.3948843Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:51:00.3949084Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2023-01-11T22:51:00.3949319Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2023-01-11T22:51:00.3949702Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:51:00.3950066Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:51:00.3950301Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2023-01-11T22:51:00.3950536Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2023-01-11T22:51:00.3950917Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:51:00.3951302Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:51:00.3951538Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2023-01-11T22:51:00.3951771Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2023-01-11T22:51:00.3952152Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:51:00.3952533Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:51:00.3952766Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2023-01-11T22:51:00.3952983Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2023-01-11T22:51:00.3953427Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:51:00.3953810Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:51:00.3954045Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2023-01-11T22:51:00.3954279Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2023-01-11T22:51:00.3954658Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:51:00.3955036Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:51:00.3955776Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3956560Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3957304Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3958035Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3958770Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3959495Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3960218Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3960948Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3961672Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3962395Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3963174Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3963896Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3964620Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3965387Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3966121Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3966846Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3967575Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3968296Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3969016Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3969742Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3970460Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3971180Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3971963Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3972665Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3972910Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2023-01-11T22:51:00.3973146Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2023-01-11T22:51:00.3973546Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:51:00.3973982Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:51:00.3974230Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2023-01-11T22:51:00.3974465Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2023-01-11T22:51:00.3974861Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:51:00.3975251Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:51:00.3975486Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2023-01-11T22:51:00.3975705Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2023-01-11T22:51:00.3976102Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:51:00.3976491Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:51:00.3976947Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2023-01-11T22:51:00.3977187Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2023-01-11T22:51:00.3977584Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:51:00.3977974Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:51:00.3978215Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2023-01-11T22:51:00.3978449Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2023-01-11T22:51:00.3978835Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:51:00.3979202Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:51:00.3979436Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2023-01-11T22:51:00.3979667Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2023-01-11T22:51:00.3980051Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:51:00.3980527Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:51:00.3980768Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2023-01-11T22:51:00.3981002Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2023-01-11T22:51:00.3981390Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:51:00.3981772Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:51:00.3981988Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2023-01-11T22:51:00.3982219Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2023-01-11T22:51:00.3982603Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:51:00.3983050Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:51:00.3983294Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2023-01-11T22:51:00.3983524Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2023-01-11T22:51:00.3983911Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:51:00.3984292Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:51:00.3984524Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2023-01-11T22:51:00.3984737Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2023-01-11T22:51:00.3985127Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:51:00.3985513Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:51:00.3985746Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 0 2023-01-11T22:51:00.3985975Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 1 2023-01-11T22:51:00.3986362Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:51:00.3986744Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:51:00.3986976Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 1 2023-01-11T22:51:00.3987209Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 0 2023-01-11T22:51:00.3987597Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:51:00.3987964Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:51:00.3988706Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3989436Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3990228Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3990962Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3991691Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3992518Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3993257Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3993982Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3994711Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3995432Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3996155Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3996881Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3997602Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3998318Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3999099Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.3999817Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4000544Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4001304Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4002032Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4002747Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4003475Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4004191Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4004908Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4005632Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4006350Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4007064Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4007839Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4008560Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4009280Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4010045Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4010772Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4011490Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4012217Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4012929Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4013647Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4014369Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4015089Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4015804Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4016786Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4017530Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4018253Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4018502Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 0 2023-01-11T22:51:00.4018969Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:51:00.4019223Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 1 2023-01-11T22:51:00.4019621Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:51:00.4020348Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4020572Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 1 2023-01-11T22:51:00.4020804Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 0 2023-01-11T22:51:00.4021198Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:51:00.4021592Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:51:00.4021831Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 0 2023-01-11T22:51:00.4022218Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:51:00.4022455Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 1 2023-01-11T22:51:00.4022845Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:51:00.4023084Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 1 2023-01-11T22:51:00.4023316Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 0 2023-01-11T22:51:00.4023687Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:51:00.4024075Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:51:00.4024188Z dist init r=1, world=2 2023-01-11T22:51:00.4024296Z dist init r=0, world=2 2023-01-11T22:51:00.4024397Z ok (6.816s) 2023-01-11T22:51:00.4024721Z test_mixture_of_experts_offload_true_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 91309 2023-01-11T22:51:00.4025011Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 91310 2023-01-11T22:51:00.4025386Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.4025548Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.4025928Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.4026119Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.4026485Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.4026658Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.4027030Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.4027217Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.4027463Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.4027731Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.4028135Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.4028528Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.4028755Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.4028978Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.4029995Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.4030113Z warnings.warn( 2023-01-11T22:51:00.4031123Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.4031231Z warnings.warn( 2023-01-11T22:51:00.4031471Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:51:00.4031714Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:51:00.4032107Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:51:00.4032832Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4032963Z File "", line 1, in 2023-01-11T22:51:00.4033172Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4033314Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4033516Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4033664Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4033933Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4034038Z self.run() 2023-01-11T22:51:00.4034225Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4034374Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4034717Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4034849Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4035210Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4035334Z getattr(self, test_name)() 2023-01-11T22:51:00.4035691Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4035789Z fn() 2023-01-11T22:51:00.4036132Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4036258Z test(self, **param_kwargs) 2023-01-11T22:51:00.4036660Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4036792Z return func(*args, **kwargs) 2023-01-11T22:51:00.4037038Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4037151Z self.run_subtests( 2023-01-11T22:51:00.4037504Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4037665Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4038004Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4038154Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4038528Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4038647Z output = model(*input) 2023-01-11T22:51:00.4038973Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4039111Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4039487Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4039659Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4040005Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4040125Z _lazy_init(state, module) 2023-01-11T22:51:00.4040473Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4040639Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4041035Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4041179Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4041517Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4041640Z return func(*args, **kwargs) 2023-01-11T22:51:00.4041995Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4042097Z p_assert( 2023-01-11T22:51:00.4042429Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4042555Z traceback.print_stack() 2023-01-11T22:51:00.4042946Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:51:00.4043747Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4043879Z File "", line 1, in 2023-01-11T22:51:00.4044087Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4044229Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4044413Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4044563Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4044776Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4044878Z self.run() 2023-01-11T22:51:00.4045078Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4045227Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4045567Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4045745Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4046100Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4046223Z getattr(self, test_name)() 2023-01-11T22:51:00.4046582Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4046679Z fn() 2023-01-11T22:51:00.4047038Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4047160Z test(self, **param_kwargs) 2023-01-11T22:51:00.4047514Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4047625Z return func(*args, **kwargs) 2023-01-11T22:51:00.4047871Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4047986Z self.run_subtests( 2023-01-11T22:51:00.4048337Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4048497Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4048855Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4049006Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4049377Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4049495Z output = model(*input) 2023-01-11T22:51:00.4049802Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4049942Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4050317Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4050490Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4050853Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4050973Z _lazy_init(state, module) 2023-01-11T22:51:00.4051322Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4051489Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4051866Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4052007Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4052403Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4052526Z return func(*args, **kwargs) 2023-01-11T22:51:00.4052907Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4053010Z p_assert( 2023-01-11T22:51:00.4053346Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4053471Z traceback.print_stack() 2023-01-11T22:51:00.4053698Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:51:00.4053939Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:51:00.4054332Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:51:00.4055369Z /opt/conda/lib/python3.10/site-packages/torch/_tensor.py:795: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:00.4055552Z return torch._VF.split_with_sizes(self, split_size, dim) 2023-01-11T22:51:00.4055681Z File "", line 1, in 2023-01-11T22:51:00.4055890Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4056035Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4056235Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4056368Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4056797Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4056919Z self.run() 2023-01-11T22:51:00.4057124Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4057272Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4057626Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4057760Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4058122Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4058227Z getattr(self, test_name)() 2023-01-11T22:51:00.4058582Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4058679Z fn() 2023-01-11T22:51:00.4059040Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4059167Z test(self, **param_kwargs) 2023-01-11T22:51:00.4059520Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4059646Z return func(*args, **kwargs) 2023-01-11T22:51:00.4059890Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4059985Z self.run_subtests( 2023-01-11T22:51:00.4060333Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4060492Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4060849Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4060999Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4061370Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4061575Z output = model(*input) 2023-01-11T22:51:00.4061910Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4062031Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4062409Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4062582Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4062944Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4063063Z _lazy_init(state, module) 2023-01-11T22:51:00.4063411Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4063578Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4063974Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4064098Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4064531Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4064666Z return func(*args, **kwargs) 2023-01-11T22:51:00.4065047Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4065150Z p_assert( 2023-01-11T22:51:00.4065484Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4065610Z traceback.print_stack() 2023-01-11T22:51:00.4066006Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:51:00.4067047Z /opt/conda/lib/python3.10/site-packages/torch/_tensor.py:795: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:00.4067215Z return torch._VF.split_with_sizes(self, split_size, dim) 2023-01-11T22:51:00.4067344Z File "", line 1, in 2023-01-11T22:51:00.4067554Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4067697Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4067898Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4068048Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4068258Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4068364Z self.run() 2023-01-11T22:51:00.4068547Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4068695Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4069039Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4069170Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4069534Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4069658Z getattr(self, test_name)() 2023-01-11T22:51:00.4070016Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4070115Z fn() 2023-01-11T22:51:00.4070460Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4070649Z test(self, **param_kwargs) 2023-01-11T22:51:00.4071005Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4071130Z return func(*args, **kwargs) 2023-01-11T22:51:00.4071380Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4071495Z self.run_subtests( 2023-01-11T22:51:00.4071847Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4071990Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4072350Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4072503Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4072873Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4072995Z output = model(*input) 2023-01-11T22:51:00.4073316Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4073502Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4073886Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4074060Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4074410Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4074531Z _lazy_init(state, module) 2023-01-11T22:51:00.4074883Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4075048Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4075443Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4075589Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4075929Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4076054Z return func(*args, **kwargs) 2023-01-11T22:51:00.4076413Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4076517Z p_assert( 2023-01-11T22:51:00.4076848Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4076973Z traceback.print_stack() 2023-01-11T22:51:00.4077217Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:51:00.4077457Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:51:00.4077858Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:51:00.4078252Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:51:00.4079000Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4079742Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4079855Z File "", line 1, in 2023-01-11T22:51:00.4080119Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4080261Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4080467Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4080618Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4080828Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4080931Z self.run() 2023-01-11T22:51:00.4081113Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4081259Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4081599Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4081731Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4082090Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4082220Z getattr(self, test_name)() 2023-01-11T22:51:00.4082636Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4082742Z fn() 2023-01-11T22:51:00.4083089Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4083211Z test(self, **param_kwargs) 2023-01-11T22:51:00.4083566Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4083690Z return func(*args, **kwargs) 2023-01-11T22:51:00.4083933Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4084046Z self.run_subtests( 2023-01-11T22:51:00.4084394Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4084559Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4084903Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4085057Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4085429Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4085546Z output = model(*input) 2023-01-11T22:51:00.4085870Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4086005Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4086378Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4086551Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4086901Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4087021Z _lazy_init(state, module) 2023-01-11T22:51:00.4087375Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4087540Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4087934Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4088076Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4088410Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4088534Z return func(*args, **kwargs) 2023-01-11T22:51:00.4088908Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4089051Z p_assert( 2023-01-11T22:51:00.4089387Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4089513Z traceback.print_stack() 2023-01-11T22:51:00.4089645Z File "", line 1, in 2023-01-11T22:51:00.4089850Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4089991Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4090192Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4090324Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4090533Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4090637Z self.run() 2023-01-11T22:51:00.4090837Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4090981Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4091322Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4091453Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4091857Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4091969Z getattr(self, test_name)() 2023-01-11T22:51:00.4092329Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4092479Z fn() 2023-01-11T22:51:00.4092845Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4092968Z test(self, **param_kwargs) 2023-01-11T22:51:00.4093325Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4093450Z return func(*args, **kwargs) 2023-01-11T22:51:00.4093700Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4093796Z self.run_subtests( 2023-01-11T22:51:00.4094149Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4094307Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4094665Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4094815Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4095186Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4095302Z output = model(*input) 2023-01-11T22:51:00.4095626Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4095747Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4096123Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4096298Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4096890Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4097018Z _lazy_init(state, module) 2023-01-11T22:51:00.4097383Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4097552Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4097950Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4098073Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4098410Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4098625Z return func(*args, **kwargs) 2023-01-11T22:51:00.4099007Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4099111Z p_assert( 2023-01-11T22:51:00.4099451Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4099578Z traceback.print_stack() 2023-01-11T22:51:00.4099823Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2023-01-11T22:51:00.4100048Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2023-01-11T22:51:00.4100449Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:51:00.4101191Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4101391Z File "", line 1, in 2023-01-11T22:51:00.4101611Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4101754Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4101956Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4102113Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4102325Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4102412Z self.run() 2023-01-11T22:51:00.4102614Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4102760Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4103108Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4103240Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4103604Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4103729Z getattr(self, test_name)() 2023-01-11T22:51:00.4104068Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4104167Z fn() 2023-01-11T22:51:00.4104529Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4104651Z test(self, **param_kwargs) 2023-01-11T22:51:00.4105004Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4105128Z return func(*args, **kwargs) 2023-01-11T22:51:00.4105372Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4105489Z self.run_subtests( 2023-01-11T22:51:00.4105825Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4105984Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4106342Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4106492Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4106863Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4106980Z output = model(*input) 2023-01-11T22:51:00.4107303Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4107440Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4107856Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4108033Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4108396Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4108515Z _lazy_init(state, module) 2023-01-11T22:51:00.4108864Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4109028Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4109421Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4109562Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4109897Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4110010Z return func(*args, **kwargs) 2023-01-11T22:51:00.4110428Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4110538Z p_assert( 2023-01-11T22:51:00.4110878Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4111003Z traceback.print_stack() 2023-01-11T22:51:00.4111398Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:51:00.4112139Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4112269Z File "", line 1, in 2023-01-11T22:51:00.4112464Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4112605Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4112806Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4112954Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4113165Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4113268Z self.run() 2023-01-11T22:51:00.4113467Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4113612Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4113934Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4114066Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4114424Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4114551Z getattr(self, test_name)() 2023-01-11T22:51:00.4114909Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4115007Z fn() 2023-01-11T22:51:00.4115365Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4115485Z test(self, **param_kwargs) 2023-01-11T22:51:00.4115819Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4115944Z return func(*args, **kwargs) 2023-01-11T22:51:00.4116188Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4116299Z self.run_subtests( 2023-01-11T22:51:00.4116647Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4116864Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4117229Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4117380Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4117732Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4117850Z output = model(*input) 2023-01-11T22:51:00.4118172Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4118308Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4118678Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4118849Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4119217Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4119337Z _lazy_init(state, module) 2023-01-11T22:51:00.4119728Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4119903Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4120301Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4120443Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4120781Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4120908Z return func(*args, **kwargs) 2023-01-11T22:51:00.4121282Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4121391Z p_assert( 2023-01-11T22:51:00.4121708Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4121833Z traceback.print_stack() 2023-01-11T22:51:00.4122083Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2023-01-11T22:51:00.4122324Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2023-01-11T22:51:00.4122722Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:51:00.4123465Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4123596Z File "", line 1, in 2023-01-11T22:51:00.4123809Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4123952Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4124138Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4124286Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4124496Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4124599Z self.run() 2023-01-11T22:51:00.4124799Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4124942Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4125281Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4125413Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4125756Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4125935Z getattr(self, test_name)() 2023-01-11T22:51:00.4126301Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4126401Z fn() 2023-01-11T22:51:00.4126765Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4126887Z test(self, **param_kwargs) 2023-01-11T22:51:00.4127241Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4127366Z return func(*args, **kwargs) 2023-01-11T22:51:00.4127592Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4127704Z self.run_subtests( 2023-01-11T22:51:00.4128054Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4128216Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4128618Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4128779Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4129156Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4129275Z output = model(*input) 2023-01-11T22:51:00.4129581Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4129718Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4130094Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4130269Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4130637Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4130757Z _lazy_init(state, module) 2023-01-11T22:51:00.4131109Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4131276Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4131653Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4131793Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4132127Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4132251Z return func(*args, **kwargs) 2023-01-11T22:51:00.4132623Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4132732Z p_assert( 2023-01-11T22:51:00.4133066Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4133190Z traceback.print_stack() 2023-01-11T22:51:00.4133572Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:51:00.4134314Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4134445Z File "", line 1, in 2023-01-11T22:51:00.4134650Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4134790Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4134989Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4135194Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4135409Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4135496Z self.run() 2023-01-11T22:51:00.4135697Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4135846Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4136191Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4136322Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4136912Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4137045Z getattr(self, test_name)() 2023-01-11T22:51:00.4137410Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4137496Z fn() 2023-01-11T22:51:00.4137857Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4138052Z test(self, **param_kwargs) 2023-01-11T22:51:00.4138420Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4138546Z return func(*args, **kwargs) 2023-01-11T22:51:00.4138789Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4138903Z self.run_subtests( 2023-01-11T22:51:00.4139253Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4139396Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4139750Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4139905Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4140280Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4140398Z output = model(*input) 2023-01-11T22:51:00.4140721Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4140858Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4141232Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4141388Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4141751Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4141870Z _lazy_init(state, module) 2023-01-11T22:51:00.4142219Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4142388Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4142787Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4142928Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4143265Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4143371Z return func(*args, **kwargs) 2023-01-11T22:51:00.4143747Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4143849Z p_assert( 2023-01-11T22:51:00.4144183Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4144308Z traceback.print_stack() 2023-01-11T22:51:00.4144628Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2023-01-11T22:51:00.4144871Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2023-01-11T22:51:00.4145276Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:51:00.4145670Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:51:00.4146394Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4147135Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4147318Z File "", line 1, in 2023-01-11T22:51:00.4147535Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4147678Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4147880Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4148028Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4148239Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4148343Z self.run() 2023-01-11T22:51:00.4148528Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4148674Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4149016Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4149155Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4149519Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4149641Z getattr(self, test_name)() 2023-01-11T22:51:00.4150000Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4150098Z fn() 2023-01-11T22:51:00.4150443Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4150567Z test(self, **param_kwargs) 2023-01-11T22:51:00.4150921Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4151045Z return func(*args, **kwargs) 2023-01-11T22:51:00.4151289Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4151405Z self.run_subtests( 2023-01-11T22:51:00.4151756Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4151915Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4152258Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4152408Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4152778Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4152895Z output = model(*input) 2023-01-11T22:51:00.4153217Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4153354Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4153788Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4153962Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4154315Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4154434Z _lazy_init(state, module) 2023-01-11T22:51:00.4154783Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4154950Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4155343Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4155483Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4155821Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4155950Z return func(*args, **kwargs) 2023-01-11T22:51:00.4156308Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4156457Z p_assert( 2023-01-11T22:51:00.4156806Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4156932Z traceback.print_stack() 2023-01-11T22:51:00.4157062Z File "", line 1, in 2023-01-11T22:51:00.4157267Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4157406Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4157607Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4157741Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4157948Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4158055Z self.run() 2023-01-11T22:51:00.4158255Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4158399Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4158741Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4158874Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4159215Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4159338Z getattr(self, test_name)() 2023-01-11T22:51:00.4159693Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4159793Z fn() 2023-01-11T22:51:00.4160156Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4160277Z test(self, **param_kwargs) 2023-01-11T22:51:00.4160635Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4160759Z return func(*args, **kwargs) 2023-01-11T22:51:00.4160988Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4161100Z self.run_subtests( 2023-01-11T22:51:00.4161447Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4161606Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4161963Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4162114Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4162484Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4162659Z output = model(*input) 2023-01-11T22:51:00.4162969Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4163116Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4163496Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4163671Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4164035Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4164153Z _lazy_init(state, module) 2023-01-11T22:51:00.4164504Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4164671Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4165067Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4165195Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4165576Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4165708Z return func(*args, **kwargs) 2023-01-11T22:51:00.4166088Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4166191Z p_assert( 2023-01-11T22:51:00.4166524Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4166650Z traceback.print_stack() 2023-01-11T22:51:00.4166875Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2023-01-11T22:51:00.4167117Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2023-01-11T22:51:00.4167513Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:51:00.4168263Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4168395Z File "", line 1, in 2023-01-11T22:51:00.4168604Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4168743Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4168944Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4169092Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4169285Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4169391Z self.run() 2023-01-11T22:51:00.4169590Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4169735Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4170077Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4170208Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4170567Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4170689Z getattr(self, test_name)() 2023-01-11T22:51:00.4171028Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4171125Z fn() 2023-01-11T22:51:00.4171484Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4171605Z test(self, **param_kwargs) 2023-01-11T22:51:00.4172017Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4172143Z return func(*args, **kwargs) 2023-01-11T22:51:00.4172392Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4172507Z self.run_subtests( 2023-01-11T22:51:00.4172838Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4172998Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4173354Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4173505Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4173874Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4173995Z output = model(*input) 2023-01-11T22:51:00.4174319Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4174455Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4174854Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4175036Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4175406Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4175527Z _lazy_init(state, module) 2023-01-11T22:51:00.4175877Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4176044Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4176440Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4176814Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4177161Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4177288Z return func(*args, **kwargs) 2023-01-11T22:51:00.4177667Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4177770Z p_assert( 2023-01-11T22:51:00.4178103Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4178228Z traceback.print_stack() 2023-01-11T22:51:00.4178624Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:51:00.4179362Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4179497Z File "", line 1, in 2023-01-11T22:51:00.4179690Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4179829Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4180030Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4180178Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4180388Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4180491Z self.run() 2023-01-11T22:51:00.4180689Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4180834Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4181158Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4181379Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4181751Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4181875Z getattr(self, test_name)() 2023-01-11T22:51:00.4182238Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4182337Z fn() 2023-01-11T22:51:00.4182697Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4182801Z test(self, **param_kwargs) 2023-01-11T22:51:00.4183155Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4183278Z return func(*args, **kwargs) 2023-01-11T22:51:00.4183522Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4183638Z self.run_subtests( 2023-01-11T22:51:00.4184062Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4184234Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4184595Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4184730Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4185102Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4185220Z output = model(*input) 2023-01-11T22:51:00.4185544Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4185680Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4186059Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4186230Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4186594Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4186714Z _lazy_init(state, module) 2023-01-11T22:51:00.4187048Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4187212Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4187608Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4187748Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4188083Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4188210Z return func(*args, **kwargs) 2023-01-11T22:51:00.4188583Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4188689Z p_assert( 2023-01-11T22:51:00.4189006Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4189130Z traceback.print_stack() 2023-01-11T22:51:00.4189373Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2023-01-11T22:51:00.4189613Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2023-01-11T22:51:00.4190009Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:51:00.4190751Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4190942Z File "", line 1, in 2023-01-11T22:51:00.4191154Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4191296Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4191481Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4191631Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4191842Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4191945Z self.run() 2023-01-11T22:51:00.4192146Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4192289Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4192685Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4192804Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4193220Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4193351Z getattr(self, test_name)() 2023-01-11T22:51:00.4193710Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4193808Z fn() 2023-01-11T22:51:00.4194171Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4194292Z test(self, **param_kwargs) 2023-01-11T22:51:00.4194646Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4194753Z return func(*args, **kwargs) 2023-01-11T22:51:00.4194999Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4195116Z self.run_subtests( 2023-01-11T22:51:00.4195466Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4195625Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4195983Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4196133Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4196504Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4196606Z output = model(*input) 2023-01-11T22:51:00.4196932Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4197072Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4197444Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4197624Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4197988Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4198108Z _lazy_init(state, module) 2023-01-11T22:51:00.4198458Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4198607Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4199002Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4199144Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4199478Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4199662Z return func(*args, **kwargs) 2023-01-11T22:51:00.4200042Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4200150Z p_assert( 2023-01-11T22:51:00.4200485Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4200594Z traceback.print_stack() 2023-01-11T22:51:00.4200992Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:51:00.4201736Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4201866Z File "", line 1, in 2023-01-11T22:51:00.4202077Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4202218Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4202463Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4202620Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4202832Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4202919Z self.run() 2023-01-11T22:51:00.4203122Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4203266Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4203605Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4203738Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4204100Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4204226Z getattr(self, test_name)() 2023-01-11T22:51:00.4204582Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4204666Z fn() 2023-01-11T22:51:00.4205027Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4205149Z test(self, **param_kwargs) 2023-01-11T22:51:00.4205508Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4205631Z return func(*args, **kwargs) 2023-01-11T22:51:00.4205873Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4205989Z self.run_subtests( 2023-01-11T22:51:00.4206338Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4206484Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4206841Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4206996Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4207367Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4207486Z output = model(*input) 2023-01-11T22:51:00.4207810Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4207949Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4208321Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4208478Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4208845Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4209057Z _lazy_init(state, module) 2023-01-11T22:51:00.4209416Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4209582Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4209977Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4210118Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4210454Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4210562Z return func(*args, **kwargs) 2023-01-11T22:51:00.4210935Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4211036Z p_assert( 2023-01-11T22:51:00.4211373Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4211497Z traceback.print_stack() 2023-01-11T22:51:00.4211788Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2023-01-11T22:51:00.4212034Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2023-01-11T22:51:00.4212435Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:51:00.4212810Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:51:00.4213554Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4214305Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4215042Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4215777Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4216510Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4217489Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4218224Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4219048Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4219778Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4220503Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4220638Z File "", line 1, in 2023-01-11T22:51:00.4220903Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4221055Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4221259Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4221411Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4221627Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4221713Z self.run() 2023-01-11T22:51:00.4221916Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4222064Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4222407Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4222541Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4222903Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4223029Z getattr(self, test_name)() 2023-01-11T22:51:00.4223390Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4223472Z fn() 2023-01-11T22:51:00.4223832Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4223954Z test(self, **param_kwargs) 2023-01-11T22:51:00.4224305Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4224429Z return func(*args, **kwargs) 2023-01-11T22:51:00.4224675Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4224787Z self.run_subtests( 2023-01-11T22:51:00.4225140Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4225283Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4225414Z File "", line 1, in 2023-01-11T22:51:00.4225775Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4225928Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4226302Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4226420Z output = model(*input) 2023-01-11T22:51:00.4226629Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4226768Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4227079Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4227271Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4227474Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4227627Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4228003Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4228175Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4228386Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4228473Z self.run() 2023-01-11T22:51:00.4228839Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4228958Z _lazy_init(state, module) 2023-01-11T22:51:00.4229158Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4229306Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4229657Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4229869Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4230216Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4230332Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4230727Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4230868Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4231224Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4231347Z getattr(self, test_name)() 2023-01-11T22:51:00.4231682Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4231810Z return func(*args, **kwargs) 2023-01-11T22:51:00.4232169Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4232250Z fn() 2023-01-11T22:51:00.4232623Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4232725Z p_assert( 2023-01-11T22:51:00.4233087Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4233208Z test(self, **param_kwargs) 2023-01-11T22:51:00.4233539Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4233664Z traceback.print_stack() 2023-01-11T22:51:00.4234017Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4234128Z return func(*args, **kwargs) 2023-01-11T22:51:00.4234372Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4234489Z self.run_subtests( 2023-01-11T22:51:00.4234839Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4234998Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4235356Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4235507Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4235877Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4235978Z output = model(*input) 2023-01-11T22:51:00.4236300Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4236494Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4236871Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4237044Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4237411Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4237530Z _lazy_init(state, module) 2023-01-11T22:51:00.4237880Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4238028Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4238420Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4238560Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4238901Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4239026Z return func(*args, **kwargs) 2023-01-11T22:51:00.4239440Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4239549Z p_assert( 2023-01-11T22:51:00.4239887Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4239996Z traceback.print_stack() 2023-01-11T22:51:00.4240240Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2023-01-11T22:51:00.4240476Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2023-01-11T22:51:00.4240872Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:51:00.4241270Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:51:00.4242013Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4242747Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4242878Z File "", line 1, in 2023-01-11T22:51:00.4243078Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4243228Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4243429Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4243579Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4243795Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4243900Z self.run() 2023-01-11T22:51:00.4244105Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4244233Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4244574Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4244709Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4245069Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4245190Z getattr(self, test_name)() 2023-01-11T22:51:00.4245550Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4245703Z fn() 2023-01-11T22:51:00.4246055Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4246178Z test(self, **param_kwargs) 2023-01-11T22:51:00.4246532Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4246658Z return func(*args, **kwargs) 2023-01-11T22:51:00.4246902Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4247021Z self.run_subtests( 2023-01-11T22:51:00.4247375Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4247539Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4247882Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4248035Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4248473Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4248598Z output = model(*input) 2023-01-11T22:51:00.4248926Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4249063Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4249436Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4249608Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4249974Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4250082Z _lazy_init(state, module) 2023-01-11T22:51:00.4250433Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4250604Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4250999Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4251141Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4251478Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4251603Z return func(*args, **kwargs) 2023-01-11T22:51:00.4251977Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4252062Z p_assert( 2023-01-11T22:51:00.4252398Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4252525Z traceback.print_stack() 2023-01-11T22:51:00.4252654Z File "", line 1, in 2023-01-11T22:51:00.4252864Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4253007Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4253206Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4253337Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4253548Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4253655Z self.run() 2023-01-11T22:51:00.4253854Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4254005Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4254341Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4254475Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4254893Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4254999Z getattr(self, test_name)() 2023-01-11T22:51:00.4255359Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4255459Z fn() 2023-01-11T22:51:00.4255822Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4255945Z test(self, **param_kwargs) 2023-01-11T22:51:00.4256297Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4256422Z return func(*args, **kwargs) 2023-01-11T22:51:00.4256895Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4256999Z self.run_subtests( 2023-01-11T22:51:00.4257364Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4257525Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4257959Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4258121Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4258497Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4258616Z output = model(*input) 2023-01-11T22:51:00.4258940Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4259059Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4259432Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4259612Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4259983Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4260104Z _lazy_init(state, module) 2023-01-11T22:51:00.4260450Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4260619Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4261015Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4261139Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4261474Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4261599Z return func(*args, **kwargs) 2023-01-11T22:51:00.4261974Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4262079Z p_assert( 2023-01-11T22:51:00.4262417Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4262545Z traceback.print_stack() 2023-01-11T22:51:00.4262787Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2023-01-11T22:51:00.4263007Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2023-01-11T22:51:00.4263401Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:51:00.4263799Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:51:00.4264540Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4265371Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4265503Z File "", line 1, in 2023-01-11T22:51:00.4265713Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4265856Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4266060Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4266209Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4266406Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4266510Z self.run() 2023-01-11T22:51:00.4266756Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4266907Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4267249Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4267383Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4267741Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4267864Z getattr(self, test_name)() 2023-01-11T22:51:00.4268205Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4268304Z fn() 2023-01-11T22:51:00.4268663Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4268788Z test(self, **param_kwargs) 2023-01-11T22:51:00.4269148Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4269270Z return func(*args, **kwargs) 2023-01-11T22:51:00.4269515Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4269627Z self.run_subtests( 2023-01-11T22:51:00.4269957Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4270122Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4270480Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4270634Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4271004Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4271126Z output = model(*input) 2023-01-11T22:51:00.4271457Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4271594Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4271954Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4272126Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4272489Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4272610Z _lazy_init(state, module) 2023-01-11T22:51:00.4272960Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4273125Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4273579Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4273726Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4274047Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4274175Z return func(*args, **kwargs) 2023-01-11T22:51:00.4274550Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4274651Z p_assert( 2023-01-11T22:51:00.4274985Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4275108Z traceback.print_stack() 2023-01-11T22:51:00.4275236Z File "", line 1, in 2023-01-11T22:51:00.4275441Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4275568Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4275768Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4275964Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4276183Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4276287Z self.run() 2023-01-11T22:51:00.4276488Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4276634Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4276955Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4277093Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4277452Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4277574Z getattr(self, test_name)() 2023-01-11T22:51:00.4277937Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4278036Z fn() 2023-01-11T22:51:00.4278404Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4278525Z test(self, **param_kwargs) 2023-01-11T22:51:00.4278860Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4278981Z return func(*args, **kwargs) 2023-01-11T22:51:00.4279228Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4279339Z self.run_subtests( 2023-01-11T22:51:00.4279686Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4279847Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4280209Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4280361Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4280717Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4280839Z output = model(*input) 2023-01-11T22:51:00.4281163Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4281301Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4281674Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4281847Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4282211Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4282386Z _lazy_init(state, module) 2023-01-11T22:51:00.4282723Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4282894Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4283290Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4283433Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4283770Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4283896Z return func(*args, **kwargs) 2023-01-11T22:51:00.4284272Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4284373Z p_assert( 2023-01-11T22:51:00.4284689Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4284817Z traceback.print_stack() 2023-01-11T22:51:00.4285060Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2023-01-11T22:51:00.4285346Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2023-01-11T22:51:00.4285755Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:51:00.4286146Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:51:00.4286890Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4287634Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4287771Z File "", line 1, in 2023-01-11T22:51:00.4287984Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4288108Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4288312Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4288464Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4288676Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4288780Z self.run() 2023-01-11T22:51:00.4288982Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4289132Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4289474Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4289593Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4289954Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4290077Z getattr(self, test_name)() 2023-01-11T22:51:00.4290434Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4290533Z fn() 2023-01-11T22:51:00.4290892Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4291017Z test(self, **param_kwargs) 2023-01-11T22:51:00.4291369Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4291531Z return func(*args, **kwargs) 2023-01-11T22:51:00.4291776Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4291889Z self.run_subtests( 2023-01-11T22:51:00.4292248Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4292407Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4292824Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4292975Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4293351Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4293454Z output = model(*input) 2023-01-11T22:51:00.4293778Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4293920Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4294294Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4294518Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4294892Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4295015Z _lazy_init(state, module) 2023-01-11T22:51:00.4295365Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4295515Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4295907Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4296048Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4296390Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4296516Z return func(*args, **kwargs) 2023-01-11T22:51:00.4297133Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4297239Z p_assert( 2023-01-11T22:51:00.4297581Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4297692Z traceback.print_stack() 2023-01-11T22:51:00.4297821Z File "", line 1, in 2023-01-11T22:51:00.4298026Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4298168Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4298365Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4298512Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4298728Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4298834Z self.run() 2023-01-11T22:51:00.4299020Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4299166Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4299507Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4299639Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4299998Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4300121Z getattr(self, test_name)() 2023-01-11T22:51:00.4300478Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4300559Z fn() 2023-01-11T22:51:00.4300922Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4301131Z test(self, **param_kwargs) 2023-01-11T22:51:00.4301491Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4301616Z return func(*args, **kwargs) 2023-01-11T22:51:00.4301861Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4301974Z self.run_subtests( 2023-01-11T22:51:00.4302325Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4302469Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4302826Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4302975Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4303345Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4303465Z output = model(*input) 2023-01-11T22:51:00.4303847Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4303994Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4304374Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4304528Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4304895Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4305016Z _lazy_init(state, module) 2023-01-11T22:51:00.4305368Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4305534Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4305937Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4306083Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4306422Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4306547Z return func(*args, **kwargs) 2023-01-11T22:51:00.4306902Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4307006Z p_assert( 2023-01-11T22:51:00.4307340Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4307466Z traceback.print_stack() 2023-01-11T22:51:00.4307710Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2023-01-11T22:51:00.4307944Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2023-01-11T22:51:00.4308344Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:51:00.4308740Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:51:00.4309465Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4310205Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4310502Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2023-01-11T22:51:00.4310739Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2023-01-11T22:51:00.4311136Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:51:00.4311527Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:51:00.4311767Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2023-01-11T22:51:00.4312001Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2023-01-11T22:51:00.4312390Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:51:00.4312785Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:51:00.4313081Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2023-01-11T22:51:00.4313300Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2023-01-11T22:51:00.4313691Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:51:00.4314077Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:51:00.4314318Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2023-01-11T22:51:00.4314555Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2023-01-11T22:51:00.4314945Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:51:00.4315340Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:51:00.4315578Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2023-01-11T22:51:00.4315811Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2023-01-11T22:51:00.4316180Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:51:00.4316567Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:51:00.4316804Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 0 2023-01-11T22:51:00.4317036Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 1 2023-01-11T22:51:00.4317427Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:51:00.4317814Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:51:00.4318555Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4318793Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 0 2023-01-11T22:51:00.4319025Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 1 2023-01-11T22:51:00.4319413Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:51:00.4319861Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:51:00.4320579Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4321311Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4322092Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4322840Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4323565Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4324297Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4325028Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4325756Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4326482Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4327207Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4327931Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4328658Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4329437Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4330162Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4330889Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4331658Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4332385Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4333108Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4333835Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4334556Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4335275Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4336003Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4336941Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4337675Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4338483Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4339204Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4339922Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4340699Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4341426Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4342146Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4342874Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4343594Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4344312Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4345038Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4345760Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4346480Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4347252Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4347972Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4348218Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 0 2023-01-11T22:51:00.4348460Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 1 2023-01-11T22:51:00.4348939Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:51:00.4349342Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:51:00.4350068Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4350791Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4351033Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 0 2023-01-11T22:51:00.4351250Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 1 2023-01-11T22:51:00.4351643Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:51:00.4352030Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:51:00.4352756Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4353000Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 0 2023-01-11T22:51:00.4353394Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:51:00.4353631Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 1 2023-01-11T22:51:00.4354022Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:51:00.4354753Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4354990Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 0 2023-01-11T22:51:00.4355274Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 1 2023-01-11T22:51:00.4355672Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:51:00.4356043Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:51:00.4356776Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4357017Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:26 to store for rank: 0 2023-01-11T22:51:00.4357401Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:26 with 2 nodes. 2023-01-11T22:51:00.4357642Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:26 to store for rank: 1 2023-01-11T22:51:00.4358087Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:26 with 2 nodes. 2023-01-11T22:51:00.4358828Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4359065Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:27 to store for rank: 0 2023-01-11T22:51:00.4359296Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:27 to store for rank: 1 2023-01-11T22:51:00.4359684Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:27 with 2 nodes. 2023-01-11T22:51:00.4360080Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:27 with 2 nodes. 2023-01-11T22:51:00.4360798Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4361036Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:28 to store for rank: 0 2023-01-11T22:51:00.4361268Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:28 to store for rank: 1 2023-01-11T22:51:00.4361658Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:28 with 2 nodes. 2023-01-11T22:51:00.4362043Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:28 with 2 nodes. 2023-01-11T22:51:00.4362779Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4363017Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:29 to store for rank: 0 2023-01-11T22:51:00.4363249Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:29 to store for rank: 1 2023-01-11T22:51:00.4363632Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:29 with 2 nodes. 2023-01-11T22:51:00.4364020Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:29 with 2 nodes. 2023-01-11T22:51:00.4364751Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4365042Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:30 to store for rank: 0 2023-01-11T22:51:00.4365257Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:30 to store for rank: 1 2023-01-11T22:51:00.4365649Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:30 with 2 nodes. 2023-01-11T22:51:00.4366042Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:30 with 2 nodes. 2023-01-11T22:51:00.4366775Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4367133Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:31 to store for rank: 0 2023-01-11T22:51:00.4367375Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:31 to store for rank: 1 2023-01-11T22:51:00.4367767Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:31 with 2 nodes. 2023-01-11T22:51:00.4368157Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:31 with 2 nodes. 2023-01-11T22:51:00.4368888Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4369136Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:32 to store for rank: 0 2023-01-11T22:51:00.4369374Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:32 to store for rank: 1 2023-01-11T22:51:00.4369746Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:32 with 2 nodes. 2023-01-11T22:51:00.4370133Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:32 with 2 nodes. 2023-01-11T22:51:00.4370867Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4371108Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:33 to store for rank: 0 2023-01-11T22:51:00.4371341Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:33 to store for rank: 1 2023-01-11T22:51:00.4371730Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:33 with 2 nodes. 2023-01-11T22:51:00.4372119Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:33 with 2 nodes. 2023-01-11T22:51:00.4372848Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4373571Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4374361Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4375094Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4375818Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4376839Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4377605Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4378326Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4379061Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4379784Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4380502Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4381228Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4381951Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4382671Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4383468Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4384192Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4384911Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4385679Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4386405Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4387127Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4387852Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4388577Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4389293Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4390020Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4390741Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4391462Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4392235Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4392999Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4393720Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4394490Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4395219Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4395939Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4396663Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4397389Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4398105Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4398828Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4399529Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4400243Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4401024Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4401743Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4402465Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4403227Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4403956Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4404673Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4405398Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4406117Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4406361Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:34 to store for rank: 0 2023-01-11T22:51:00.4406597Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:34 to store for rank: 1 2023-01-11T22:51:00.4407001Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:34 with 2 nodes. 2023-01-11T22:51:00.4407398Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:34 with 2 nodes. 2023-01-11T22:51:00.4408125Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4408846Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4409138Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:35 to store for rank: 0 2023-01-11T22:51:00.4409373Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:35 to store for rank: 1 2023-01-11T22:51:00.4409768Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:35 with 2 nodes. 2023-01-11T22:51:00.4410157Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:35 with 2 nodes. 2023-01-11T22:51:00.4410881Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4411604Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4411872Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:36 to store for rank: 0 2023-01-11T22:51:00.4412111Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:36 to store for rank: 1 2023-01-11T22:51:00.4412504Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:36 with 2 nodes. 2023-01-11T22:51:00.4412890Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:36 with 2 nodes. 2023-01-11T22:51:00.4413614Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4414345Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4414582Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:37 to store for rank: 0 2023-01-11T22:51:00.4414813Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:37 to store for rank: 1 2023-01-11T22:51:00.4415203Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:37 with 2 nodes. 2023-01-11T22:51:00.4415584Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:37 with 2 nodes. 2023-01-11T22:51:00.4416313Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4417261Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4417379Z dist init r=1, world=2 2023-01-11T22:51:00.4417687Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.4418002Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.4418395Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.4418698Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.4418998Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.4419297Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.4419594Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.4419895Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.4420248Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.4420556Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.4420852Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.4421148Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.4421243Z dist init r=0, world=2 2023-01-11T22:51:00.4421567Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.4421873Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.4422169Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.4422465Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.4422763Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.4423058Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.4423361Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.4423655Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.4423951Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.4424250Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.4424528Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.4424880Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.4424983Z ok (7.216s) 2023-01-11T22:51:00.4425305Z test_mixture_of_experts_offload_true_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 91656 2023-01-11T22:51:00.4425525Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 91657 2023-01-11T22:51:00.4425912Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.4426087Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.4426467Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.4426661Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.4427007Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.4427225Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.4427609Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.4427804Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.4428047Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.4428291Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.4428688Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.4429081Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.4429317Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.4429527Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.4430538Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.4430651Z warnings.warn( 2023-01-11T22:51:00.4430890Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:51:00.4431894Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.4432007Z warnings.warn( 2023-01-11T22:51:00.4432246Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:51:00.4432640Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:51:00.4433166Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T22:51:00.4433338Z warnings.warn( 2023-01-11T22:51:00.4433733Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:51:00.4434257Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T22:51:00.4434350Z warnings.warn( 2023-01-11T22:51:00.4435093Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4435817Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4435954Z File "", line 1, in 2023-01-11T22:51:00.4436237Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4436388Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4436591Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4436742Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4436956Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4437042Z self.run() 2023-01-11T22:51:00.4437245Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4437392Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4437733Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4437870Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4438228Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4438355Z getattr(self, test_name)() 2023-01-11T22:51:00.4438716Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4438797Z fn() 2023-01-11T22:51:00.4439162Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4439286Z test(self, **param_kwargs) 2023-01-11T22:51:00.4439637Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4439762Z return func(*args, **kwargs) 2023-01-11T22:51:00.4440005Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4440123Z self.run_subtests( 2023-01-11T22:51:00.4440473Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4440620Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4440978Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4441133Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4441502Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4441622Z output = model(*input) 2023-01-11T22:51:00.4441947Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4442086Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4442461Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4442672Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4443041Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4443164Z _lazy_init(state, module) 2023-01-11T22:51:00.4443516Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4443683Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4444078Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4444222Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4444558Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4444666Z return func(*args, **kwargs) 2023-01-11T22:51:00.4445048Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4445153Z p_assert( 2023-01-11T22:51:00.4445536Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4445670Z traceback.print_stack() 2023-01-11T22:51:00.4445799Z File "", line 1, in 2023-01-11T22:51:00.4446008Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4446151Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4446336Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4446487Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4446701Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4446807Z self.run() 2023-01-11T22:51:00.4447011Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4447158Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4447506Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4447621Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4447981Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4448104Z getattr(self, test_name)() 2023-01-11T22:51:00.4448464Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4448562Z fn() 2023-01-11T22:51:00.4448923Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4449046Z test(self, **param_kwargs) 2023-01-11T22:51:00.4449400Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4449511Z return func(*args, **kwargs) 2023-01-11T22:51:00.4449757Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4449870Z self.run_subtests( 2023-01-11T22:51:00.4450221Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4450382Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4450740Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4450892Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4451262Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4451364Z output = model(*input) 2023-01-11T22:51:00.4451747Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4451885Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4452263Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4452436Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4452801Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4452920Z _lazy_init(state, module) 2023-01-11T22:51:00.4453272Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4453422Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4453817Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4453962Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4454300Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4454471Z return func(*args, **kwargs) 2023-01-11T22:51:00.4454856Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4454960Z p_assert( 2023-01-11T22:51:00.4455293Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4455401Z traceback.print_stack() 2023-01-11T22:51:00.4455647Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:51:00.4455889Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:51:00.4456286Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:51:00.4457265Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4457398Z File "", line 1, in 2023-01-11T22:51:00.4457608Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4457751Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4457954Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4458088Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4458300Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4458406Z self.run() 2023-01-11T22:51:00.4458610Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4458760Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4459108Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4459242Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4459601Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4459707Z getattr(self, test_name)() 2023-01-11T22:51:00.4460065Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4460168Z fn() 2023-01-11T22:51:00.4460529Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4460651Z test(self, **param_kwargs) 2023-01-11T22:51:00.4461006Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4461217Z return func(*args, **kwargs) 2023-01-11T22:51:00.4461467Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4461564Z self.run_subtests( 2023-01-11T22:51:00.4461919Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4462080Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4462438Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4462590Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4462962Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4463081Z output = model(*input) 2023-01-11T22:51:00.4463410Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4463531Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4463967Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4464153Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4464524Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4464646Z _lazy_init(state, module) 2023-01-11T22:51:00.4465000Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4465168Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4465566Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4465695Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4466032Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4466161Z return func(*args, **kwargs) 2023-01-11T22:51:00.4466535Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4466639Z p_assert( 2023-01-11T22:51:00.4466971Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4467098Z traceback.print_stack() 2023-01-11T22:51:00.4467496Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:51:00.4468238Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4468356Z File "", line 1, in 2023-01-11T22:51:00.4468569Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4468711Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4468913Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4469063Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4469275Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4469377Z self.run() 2023-01-11T22:51:00.4469576Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4469705Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4470044Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4470236Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4470598Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4470729Z getattr(self, test_name)() 2023-01-11T22:51:00.4471086Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4471187Z fn() 2023-01-11T22:51:00.4471532Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4471662Z test(self, **param_kwargs) 2023-01-11T22:51:00.4472018Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4472141Z return func(*args, **kwargs) 2023-01-11T22:51:00.4472388Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4472506Z self.run_subtests( 2023-01-11T22:51:00.4472856Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4473063Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4473416Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4473570Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4473945Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4474066Z output = model(*input) 2023-01-11T22:51:00.4474390Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4474521Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4493680Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4493924Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4494360Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4494487Z _lazy_init(state, module) 2023-01-11T22:51:00.4494847Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4495014Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4495398Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4495544Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4495884Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4496011Z return func(*args, **kwargs) 2023-01-11T22:51:00.4496393Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4496496Z p_assert( 2023-01-11T22:51:00.4497191Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4497324Z traceback.print_stack() 2023-01-11T22:51:00.4497553Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:51:00.4497798Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:51:00.4498199Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:51:00.4498945Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4499241Z File "", line 1, in 2023-01-11T22:51:00.4499459Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4499605Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4499812Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4499963Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4500157Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4500262Z self.run() 2023-01-11T22:51:00.4500463Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4500607Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4500957Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4501096Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4501461Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4501655Z getattr(self, test_name)() 2023-01-11T22:51:00.4502010Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4502109Z fn() 2023-01-11T22:51:00.4502474Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4502597Z test(self, **param_kwargs) 2023-01-11T22:51:00.4502951Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4503077Z return func(*args, **kwargs) 2023-01-11T22:51:00.4503321Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4503422Z self.run_subtests( 2023-01-11T22:51:00.4503775Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4503939Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4504301Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4504453Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4504828Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4504947Z output = model(*input) 2023-01-11T22:51:00.4505273Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4505414Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4505772Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4505950Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4506317Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4506439Z _lazy_init(state, module) 2023-01-11T22:51:00.4506790Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4506957Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4507353Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4507495Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4507814Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4507940Z return func(*args, **kwargs) 2023-01-11T22:51:00.4508424Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4508528Z p_assert( 2023-01-11T22:51:00.4508869Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4508997Z traceback.print_stack() 2023-01-11T22:51:00.4509393Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:51:00.4510133Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4510264Z File "", line 1, in 2023-01-11T22:51:00.4510456Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4510601Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4510805Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4510999Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4511223Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4511330Z self.run() 2023-01-11T22:51:00.4511532Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4511660Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4512000Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4512134Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4512497Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4512627Z getattr(self, test_name)() 2023-01-11T22:51:00.4512986Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4513085Z fn() 2023-01-11T22:51:00.4513453Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4513558Z test(self, **param_kwargs) 2023-01-11T22:51:00.4513916Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4514039Z return func(*args, **kwargs) 2023-01-11T22:51:00.4514284Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4514398Z self.run_subtests( 2023-01-11T22:51:00.4514749Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4514910Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4515272Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4515410Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4515786Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4515905Z output = model(*input) 2023-01-11T22:51:00.4516230Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4516369Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4516743Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4516917Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4517284Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4517443Z _lazy_init(state, module) 2023-01-11T22:51:00.4517802Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4517968Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4518363Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4518506Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4518844Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4518971Z return func(*args, **kwargs) 2023-01-11T22:51:00.4519345Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4519431Z p_assert( 2023-01-11T22:51:00.4519767Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4519895Z traceback.print_stack() 2023-01-11T22:51:00.4520200Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2023-01-11T22:51:00.4520451Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2023-01-11T22:51:00.4520855Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:51:00.4521601Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4521735Z File "", line 1, in 2023-01-11T22:51:00.4521947Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4522077Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4522279Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4522430Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4522642Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4522744Z self.run() 2023-01-11T22:51:00.4522942Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4523088Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4523433Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4523549Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4523909Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4524031Z getattr(self, test_name)() 2023-01-11T22:51:00.4524395Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4524496Z fn() 2023-01-11T22:51:00.4524862Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4524984Z test(self, **param_kwargs) 2023-01-11T22:51:00.4525341Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4525449Z return func(*args, **kwargs) 2023-01-11T22:51:00.4525693Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4525806Z self.run_subtests( 2023-01-11T22:51:00.4526155Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4526316Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4526742Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4526901Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4527275Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4527377Z output = model(*input) 2023-01-11T22:51:00.4527702Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4527839Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4528213Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4528388Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4528753Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4528877Z _lazy_init(state, module) 2023-01-11T22:51:00.4529275Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4529436Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4529840Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4529986Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4530324Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4530449Z return func(*args, **kwargs) 2023-01-11T22:51:00.4530822Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4530923Z p_assert( 2023-01-11T22:51:00.4531256Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4531370Z traceback.print_stack() 2023-01-11T22:51:00.4531770Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:51:00.4532515Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4532649Z File "", line 1, in 2023-01-11T22:51:00.4532858Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4533000Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4533201Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4533351Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4533565Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4533652Z self.run() 2023-01-11T22:51:00.4533854Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4534001Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4534342Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4534474Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4534834Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4534956Z getattr(self, test_name)() 2023-01-11T22:51:00.4535295Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4535393Z fn() 2023-01-11T22:51:00.4535753Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4535935Z test(self, **param_kwargs) 2023-01-11T22:51:00.4536303Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4536429Z return func(*args, **kwargs) 2023-01-11T22:51:00.4536873Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4536996Z self.run_subtests( 2023-01-11T22:51:00.4537338Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4537502Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4537859Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4538010Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4538388Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4538506Z output = model(*input) 2023-01-11T22:51:00.4538903Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4539054Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4539415Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4539591Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4539956Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4540078Z _lazy_init(state, module) 2023-01-11T22:51:00.4540433Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4540604Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4541000Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4541145Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4541483Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4541591Z return func(*args, **kwargs) 2023-01-11T22:51:00.4541965Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4542066Z p_assert( 2023-01-11T22:51:00.4542404Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4542532Z traceback.print_stack() 2023-01-11T22:51:00.4542775Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2023-01-11T22:51:00.4543021Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2023-01-11T22:51:00.4543420Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:51:00.4544150Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4544282Z File "", line 1, in 2023-01-11T22:51:00.4544491Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4544634Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4544835Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4544985Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4545272Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4545377Z self.run() 2023-01-11T22:51:00.4545562Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4545710Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4546054Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4546188Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4546554Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4546679Z getattr(self, test_name)() 2023-01-11T22:51:00.4547038Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4547136Z fn() 2023-01-11T22:51:00.4547481Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4547607Z test(self, **param_kwargs) 2023-01-11T22:51:00.4548008Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4548140Z return func(*args, **kwargs) 2023-01-11T22:51:00.4548388Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4548503Z self.run_subtests( 2023-01-11T22:51:00.4548855Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4549015Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4549354Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4549506Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4549880Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4550000Z output = model(*input) 2023-01-11T22:51:00.4550327Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4550464Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4550839Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4551012Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4551359Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4551479Z _lazy_init(state, module) 2023-01-11T22:51:00.4551829Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4551999Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4552395Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4552539Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4552875Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4552999Z return func(*args, **kwargs) 2023-01-11T22:51:00.4553358Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4553460Z p_assert( 2023-01-11T22:51:00.4553794Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4553919Z traceback.print_stack() 2023-01-11T22:51:00.4554318Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:51:00.4555127Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4555260Z File "", line 1, in 2023-01-11T22:51:00.4555469Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4555613Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4555795Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4555947Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4556164Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4556268Z self.run() 2023-01-11T22:51:00.4556472Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4556620Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4557008Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4557147Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4557491Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4557616Z getattr(self, test_name)() 2023-01-11T22:51:00.4557977Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4558076Z fn() 2023-01-11T22:51:00.4558440Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4558561Z test(self, **param_kwargs) 2023-01-11T22:51:00.4558916Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4559027Z return func(*args, **kwargs) 2023-01-11T22:51:00.4559275Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4559387Z self.run_subtests( 2023-01-11T22:51:00.4559739Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4559899Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4560258Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4560410Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4560782Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4560884Z output = model(*input) 2023-01-11T22:51:00.4561208Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4561347Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4561723Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4561895Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4562260Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4562380Z _lazy_init(state, module) 2023-01-11T22:51:00.4562730Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4562895Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4563271Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4563473Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4563818Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4563944Z return func(*args, **kwargs) 2023-01-11T22:51:00.4564324Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4564425Z p_assert( 2023-01-11T22:51:00.4564758Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4564883Z traceback.print_stack() 2023-01-11T22:51:00.4565108Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2023-01-11T22:51:00.4565354Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2023-01-11T22:51:00.4565752Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:51:00.4566545Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4566682Z File "", line 1, in 2023-01-11T22:51:00.4566894Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4567040Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4567244Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4567394Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4567588Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4567693Z self.run() 2023-01-11T22:51:00.4567892Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4568041Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4568384Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4568516Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4568875Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4568981Z getattr(self, test_name)() 2023-01-11T22:51:00.4569338Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4569437Z fn() 2023-01-11T22:51:00.4569804Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4569928Z test(self, **param_kwargs) 2023-01-11T22:51:00.4570282Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4570409Z return func(*args, **kwargs) 2023-01-11T22:51:00.4570654Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4570753Z self.run_subtests( 2023-01-11T22:51:00.4571102Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4571263Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4571620Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4571772Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4572145Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4572265Z output = model(*input) 2023-01-11T22:51:00.4572588Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4572763Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4573149Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4573322Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4573691Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4573813Z _lazy_init(state, module) 2023-01-11T22:51:00.4574164Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4574332Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4574731Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4574855Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4575196Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4575321Z return func(*args, **kwargs) 2023-01-11T22:51:00.4575736Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4575846Z p_assert( 2023-01-11T22:51:00.4576185Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4576311Z traceback.print_stack() 2023-01-11T22:51:00.4576981Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:51:00.4577742Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4577863Z File "", line 1, in 2023-01-11T22:51:00.4578076Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4578217Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4578418Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4578567Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4578779Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4578883Z self.run() 2023-01-11T22:51:00.4579083Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4579212Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4579554Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4579689Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4580049Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4580180Z getattr(self, test_name)() 2023-01-11T22:51:00.4580540Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4580638Z fn() 2023-01-11T22:51:00.4580998Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4581103Z test(self, **param_kwargs) 2023-01-11T22:51:00.4581456Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4581582Z return func(*args, **kwargs) 2023-01-11T22:51:00.4581826Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4582033Z self.run_subtests( 2023-01-11T22:51:00.4582388Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4582554Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4582916Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4583051Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4583422Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4583540Z output = model(*input) 2023-01-11T22:51:00.4583865Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4584003Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4584378Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4584552Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4584990Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4585104Z _lazy_init(state, module) 2023-01-11T22:51:00.4585460Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4585628Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4586026Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4586167Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4586504Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4586631Z return func(*args, **kwargs) 2023-01-11T22:51:00.4587010Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4587097Z p_assert( 2023-01-11T22:51:00.4587434Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4587561Z traceback.print_stack() 2023-01-11T22:51:00.4587803Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2023-01-11T22:51:00.4588046Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2023-01-11T22:51:00.4588444Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:51:00.4589185Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4589319Z File "", line 1, in 2023-01-11T22:51:00.4589530Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4589655Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4589858Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4590005Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4590216Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4590321Z self.run() 2023-01-11T22:51:00.4590521Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4590665Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4590988Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4591178Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4591542Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4591670Z getattr(self, test_name)() 2023-01-11T22:51:00.4592030Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4592129Z fn() 2023-01-11T22:51:00.4592495Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4592665Z test(self, **param_kwargs) 2023-01-11T22:51:00.4593009Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4593136Z return func(*args, **kwargs) 2023-01-11T22:51:00.4593383Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4593500Z self.run_subtests( 2023-01-11T22:51:00.4593854Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4594066Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4594436Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4594589Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4594944Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4595066Z output = model(*input) 2023-01-11T22:51:00.4595390Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4595527Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4595902Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4596080Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4596448Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4596569Z _lazy_init(state, module) 2023-01-11T22:51:00.4596902Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4597068Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4597464Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4597607Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4597942Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4598069Z return func(*args, **kwargs) 2023-01-11T22:51:00.4598449Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4598551Z p_assert( 2023-01-11T22:51:00.4598870Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4598996Z traceback.print_stack() 2023-01-11T22:51:00.4599391Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:51:00.4600139Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4600270Z File "", line 1, in 2023-01-11T22:51:00.4600477Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4600682Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4600885Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4601039Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4601233Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4601339Z self.run() 2023-01-11T22:51:00.4601540Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4601686Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4602027Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4602159Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4602516Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4602640Z getattr(self, test_name)() 2023-01-11T22:51:00.4602982Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4603080Z fn() 2023-01-11T22:51:00.4603484Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4603615Z test(self, **param_kwargs) 2023-01-11T22:51:00.4603972Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4604097Z return func(*args, **kwargs) 2023-01-11T22:51:00.4604340Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4604453Z self.run_subtests( 2023-01-11T22:51:00.4604785Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4604946Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4605308Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4605458Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4605837Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4605955Z output = model(*input) 2023-01-11T22:51:00.4606281Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4606418Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4606775Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4606948Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4607313Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4607436Z _lazy_init(state, module) 2023-01-11T22:51:00.4607787Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4607957Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4608353Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4608493Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4608811Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4608938Z return func(*args, **kwargs) 2023-01-11T22:51:00.4609309Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4609411Z p_assert( 2023-01-11T22:51:00.4609745Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4609927Z traceback.print_stack() 2023-01-11T22:51:00.4610173Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2023-01-11T22:51:00.4610417Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2023-01-11T22:51:00.4610805Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:51:00.4611547Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4611678Z File "", line 1, in 2023-01-11T22:51:00.4611887Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4612031Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4612231Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4612426Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4612646Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4612752Z self.run() 2023-01-11T22:51:00.4612931Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4613076Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4613417Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4613549Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4613908Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4614030Z getattr(self, test_name)() 2023-01-11T22:51:00.4614394Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4614475Z fn() 2023-01-11T22:51:00.4614840Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4614963Z test(self, **param_kwargs) 2023-01-11T22:51:00.4615318Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4615443Z return func(*args, **kwargs) 2023-01-11T22:51:00.4615689Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4615803Z self.run_subtests( 2023-01-11T22:51:00.4616154Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4616299Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4616846Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4617011Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4617398Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4617521Z output = model(*input) 2023-01-11T22:51:00.4617849Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4617990Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4618366Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4618542Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4618889Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4619100Z _lazy_init(state, module) 2023-01-11T22:51:00.4619455Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4619629Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4620027Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4620172Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4620510Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4620636Z return func(*args, **kwargs) 2023-01-11T22:51:00.4620994Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4621096Z p_assert( 2023-01-11T22:51:00.4621430Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4621560Z traceback.print_stack() 2023-01-11T22:51:00.4622017Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:51:00.4622776Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4622907Z File "", line 1, in 2023-01-11T22:51:00.4623114Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4623238Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4623441Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4623590Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4623808Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4623911Z self.run() 2023-01-11T22:51:00.4624113Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4624259Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4624598Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4624713Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4625072Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4625194Z getattr(self, test_name)() 2023-01-11T22:51:00.4625550Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4625648Z fn() 2023-01-11T22:51:00.4626010Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4626136Z test(self, **param_kwargs) 2023-01-11T22:51:00.4626497Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4626606Z return func(*args, **kwargs) 2023-01-11T22:51:00.4626850Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4626964Z self.run_subtests( 2023-01-11T22:51:00.4627314Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4627477Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4627839Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4627990Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4628361Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4628518Z output = model(*input) 2023-01-11T22:51:00.4628849Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4628988Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4629360Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4629533Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4629899Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4630021Z _lazy_init(state, module) 2023-01-11T22:51:00.4630375Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4630524Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4630924Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4631109Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4631456Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4631581Z return func(*args, **kwargs) 2023-01-11T22:51:00.4631955Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4632058Z p_assert( 2023-01-11T22:51:00.4632392Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4632501Z traceback.print_stack() 2023-01-11T22:51:00.4632745Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2023-01-11T22:51:00.4632982Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2023-01-11T22:51:00.4633390Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:51:00.4634134Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4634871Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4635607Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4636349Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4637076Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4637209Z File "", line 1, in 2023-01-11T22:51:00.4637420Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4637631Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4637840Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4637974Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4638186Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4638291Z self.run() 2023-01-11T22:51:00.4638493Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4638638Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4638985Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4639117Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4639458Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4639587Z getattr(self, test_name)() 2023-01-11T22:51:00.4639945Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4640044Z fn() 2023-01-11T22:51:00.4640452Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4640582Z test(self, **param_kwargs) 2023-01-11T22:51:00.4640942Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4641069Z return func(*args, **kwargs) 2023-01-11T22:51:00.4641294Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4641408Z self.run_subtests( 2023-01-11T22:51:00.4641758Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4641925Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4642283Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4642437Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4642810Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4642929Z output = model(*input) 2023-01-11T22:51:00.4643236Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4643376Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4643748Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4643922Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4644286Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4644411Z _lazy_init(state, module) 2023-01-11T22:51:00.4644764Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4644931Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4645326Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4645450Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4645787Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4645911Z return func(*args, **kwargs) 2023-01-11T22:51:00.4646287Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4646391Z p_assert( 2023-01-11T22:51:00.4646786Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4646914Z traceback.print_stack() 2023-01-11T22:51:00.4647317Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:51:00.4648044Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4648782Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4649617Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4650366Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4651087Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4651224Z File "", line 1, in 2023-01-11T22:51:00.4651435Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4651575Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4651782Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4651934Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4652145Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4652249Z self.run() 2023-01-11T22:51:00.4652433Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4652579Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4652920Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4653052Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4653413Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4653538Z getattr(self, test_name)() 2023-01-11T22:51:00.4653899Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4653996Z fn() 2023-01-11T22:51:00.4654343Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4654464Z test(self, **param_kwargs) 2023-01-11T22:51:00.4654817Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4654941Z return func(*args, **kwargs) 2023-01-11T22:51:00.4655187Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4655300Z self.run_subtests( 2023-01-11T22:51:00.4655649Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4655848Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4656214Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4656367Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4657047Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4657174Z output = model(*input) 2023-01-11T22:51:00.4657510Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4657649Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4658024Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4658199Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4658549Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4658672Z _lazy_init(state, module) 2023-01-11T22:51:00.4659101Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4659283Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4659684Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4659828Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4660165Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4660290Z return func(*args, **kwargs) 2023-01-11T22:51:00.4660646Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4660755Z p_assert( 2023-01-11T22:51:00.4661090Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4661216Z traceback.print_stack() 2023-01-11T22:51:00.4661465Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2023-01-11T22:51:00.4661704Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2023-01-11T22:51:00.4662102Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:51:00.4662493Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:51:00.4663235Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4663980Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4664093Z File "", line 1, in 2023-01-11T22:51:00.4664304Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4664444Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4664645Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4664795Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4665005Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4665183Z self.run() 2023-01-11T22:51:00.4665369Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4665516Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4665862Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4665995Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4666355Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4666480Z getattr(self, test_name)() 2023-01-11T22:51:00.4666839Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4666994Z fn() 2023-01-11T22:51:00.4667349Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4667476Z test(self, **param_kwargs) 2023-01-11T22:51:00.4667840Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4667968Z return func(*args, **kwargs) 2023-01-11T22:51:00.4668291Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4668414Z self.run_subtests( 2023-01-11T22:51:00.4668769Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4668934Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4669275Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4669429Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4669801Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4669924Z output = model(*input) 2023-01-11T22:51:00.4670247Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4670389Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4670766Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4670939Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4671285Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4671406Z _lazy_init(state, module) 2023-01-11T22:51:00.4671755Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4671920Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4672315Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4672463Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4672807Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4672930Z return func(*args, **kwargs) 2023-01-11T22:51:00.4673304Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4673389Z p_assert( 2023-01-11T22:51:00.4673720Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4673847Z traceback.print_stack() 2023-01-11T22:51:00.4673975Z File "", line 1, in 2023-01-11T22:51:00.4674185Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4674325Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4674524Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4674711Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4674927Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4675035Z self.run() 2023-01-11T22:51:00.4675239Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4675384Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4675724Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4675854Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4676215Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4676321Z getattr(self, test_name)() 2023-01-11T22:51:00.4676681Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4676781Z fn() 2023-01-11T22:51:00.4677143Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4677311Z test(self, **param_kwargs) 2023-01-11T22:51:00.4677675Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4677801Z return func(*args, **kwargs) 2023-01-11T22:51:00.4678048Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4678145Z self.run_subtests( 2023-01-11T22:51:00.4678492Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4678683Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4679046Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4679205Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4679606Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4679726Z output = model(*input) 2023-01-11T22:51:00.4680050Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4680171Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4680545Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4680718Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4681080Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4681200Z _lazy_init(state, module) 2023-01-11T22:51:00.4681551Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4681719Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4682116Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4682241Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4682579Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4682704Z return func(*args, **kwargs) 2023-01-11T22:51:00.4683076Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4683178Z p_assert( 2023-01-11T22:51:00.4683509Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4683634Z traceback.print_stack() 2023-01-11T22:51:00.4683937Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2023-01-11T22:51:00.4684157Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2023-01-11T22:51:00.4684558Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:51:00.4685302Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4685434Z File "", line 1, in 2023-01-11T22:51:00.4685644Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4685788Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4685989Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4686143Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4686401Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4686497Z self.run() 2023-01-11T22:51:00.4686702Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4686849Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4687190Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4687322Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4687681Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4687802Z getattr(self, test_name)() 2023-01-11T22:51:00.4688163Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4688248Z fn() 2023-01-11T22:51:00.4688611Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4688737Z test(self, **param_kwargs) 2023-01-11T22:51:00.4689093Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4689218Z return func(*args, **kwargs) 2023-01-11T22:51:00.4689463Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4689576Z self.run_subtests( 2023-01-11T22:51:00.4689909Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4690070Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4690429Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4690585Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4690961Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4691082Z output = model(*input) 2023-01-11T22:51:00.4691408Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4691546Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4691902Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4692078Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4692441Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4692606Z _lazy_init(state, module) 2023-01-11T22:51:00.4692970Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4693197Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4693601Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4693745Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4694082Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4694189Z return func(*args, **kwargs) 2023-01-11T22:51:00.4694560Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4694661Z p_assert( 2023-01-11T22:51:00.4694994Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4695121Z traceback.print_stack() 2023-01-11T22:51:00.4695521Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:51:00.4696307Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4696447Z File "", line 1, in 2023-01-11T22:51:00.4696840Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4696993Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4697197Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4697351Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4697564Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4697674Z self.run() 2023-01-11T22:51:00.4697875Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4698021Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4698357Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4698492Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4698854Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4698980Z getattr(self, test_name)() 2023-01-11T22:51:00.4699339Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4699436Z fn() 2023-01-11T22:51:00.4699796Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4699918Z test(self, **param_kwargs) 2023-01-11T22:51:00.4700258Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4700382Z return func(*args, **kwargs) 2023-01-11T22:51:00.4700630Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4700745Z self.run_subtests( 2023-01-11T22:51:00.4701094Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4701253Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4701612Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4701764Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4702118Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4702328Z output = model(*input) 2023-01-11T22:51:00.4702657Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4702800Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4703179Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4703353Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4703717Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4703836Z _lazy_init(state, module) 2023-01-11T22:51:00.4704170Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4704337Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4704731Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4704877Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4705278Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4705414Z return func(*args, **kwargs) 2023-01-11T22:51:00.4705793Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4705898Z p_assert( 2023-01-11T22:51:00.4706214Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4706342Z traceback.print_stack() 2023-01-11T22:51:00.4706588Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2023-01-11T22:51:00.4706829Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2023-01-11T22:51:00.4707234Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:51:00.4707980Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4708111Z File "", line 1, in 2023-01-11T22:51:00.4708320Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4708459Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4708646Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4708796Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4709007Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4709114Z self.run() 2023-01-11T22:51:00.4709315Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4709461Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4709802Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4709934Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4710278Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4710402Z getattr(self, test_name)() 2023-01-11T22:51:00.4710759Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4710858Z fn() 2023-01-11T22:51:00.4711219Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4711341Z test(self, **param_kwargs) 2023-01-11T22:51:00.4711760Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4711887Z return func(*args, **kwargs) 2023-01-11T22:51:00.4712116Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4712231Z self.run_subtests( 2023-01-11T22:51:00.4712581Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4712742Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4713104Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4713255Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4713628Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4713751Z output = model(*input) 2023-01-11T22:51:00.4714056Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4714251Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4714640Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4714815Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4715176Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4715297Z _lazy_init(state, module) 2023-01-11T22:51:00.4715647Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4715815Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4716192Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4716338Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4716677Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4716804Z return func(*args, **kwargs) 2023-01-11T22:51:00.4717177Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4717280Z p_assert( 2023-01-11T22:51:00.4717614Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4717740Z traceback.print_stack() 2023-01-11T22:51:00.4718120Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:51:00.4718857Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4718996Z File "", line 1, in 2023-01-11T22:51:00.4719205Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4719346Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4719547Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4719694Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4719905Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4719990Z self.run() 2023-01-11T22:51:00.4720190Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4720334Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4720672Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4720860Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4721228Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4721351Z getattr(self, test_name)() 2023-01-11T22:51:00.4721709Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4721789Z fn() 2023-01-11T22:51:00.4722154Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4722278Z test(self, **param_kwargs) 2023-01-11T22:51:00.4722635Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4722760Z return func(*args, **kwargs) 2023-01-11T22:51:00.4723005Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4723122Z self.run_subtests( 2023-01-11T22:51:00.4723515Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4723664Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4724028Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4724179Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4724549Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4724667Z output = model(*input) 2023-01-11T22:51:00.4724992Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4725130Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4725508Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4725664Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4726031Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4726152Z _lazy_init(state, module) 2023-01-11T22:51:00.4726505Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4726671Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4727067Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4727208Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4727542Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4727655Z return func(*args, **kwargs) 2023-01-11T22:51:00.4728031Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4728133Z p_assert( 2023-01-11T22:51:00.4728466Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4728590Z traceback.print_stack() 2023-01-11T22:51:00.4728836Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2023-01-11T22:51:00.4729076Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2023-01-11T22:51:00.4729474Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:51:00.4730213Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4730668Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:51:00.4731376Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4731619Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2023-01-11T22:51:00.4731850Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2023-01-11T22:51:00.4732237Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:51:00.4732677Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:51:00.4732925Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2023-01-11T22:51:00.4733161Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2023-01-11T22:51:00.4733560Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:51:00.4733951Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:51:00.4734193Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2023-01-11T22:51:00.4734406Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2023-01-11T22:51:00.4734797Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:51:00.4735186Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:51:00.4735426Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2023-01-11T22:51:00.4735660Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2023-01-11T22:51:00.4736048Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:51:00.4736435Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:51:00.4736905Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2023-01-11T22:51:00.4737154Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2023-01-11T22:51:00.4737538Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:51:00.4737926Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:51:00.4738165Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 1 2023-01-11T22:51:00.4738398Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 0 2023-01-11T22:51:00.4738785Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:51:00.4739174Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:51:00.4740009Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4740252Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 1 2023-01-11T22:51:00.4740487Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 0 2023-01-11T22:51:00.4740876Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:51:00.4741244Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:51:00.4741990Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4742793Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4743543Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4744276Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4745008Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4745742Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4746470Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4747198Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4747918Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4748645Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4749441Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4750174Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4750896Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4751661Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4752387Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4753111Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4753839Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4754561Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4755282Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4756008Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4756722Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4757442Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4758216Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4758940Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4759662Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4759956Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 1 2023-01-11T22:51:00.4760201Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 0 2023-01-11T22:51:00.4760601Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:51:00.4760995Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:51:00.4761717Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4762443Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4762682Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 1 2023-01-11T22:51:00.4762913Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 0 2023-01-11T22:51:00.4763285Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:51:00.4763675Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:51:00.4763920Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 1 2023-01-11T22:51:00.4764154Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 0 2023-01-11T22:51:00.4764552Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:51:00.4764947Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:51:00.4765180Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 1 2023-01-11T22:51:00.4765411Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 0 2023-01-11T22:51:00.4765796Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:51:00.4766181Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:51:00.4766454Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:26 to store for rank: 1 2023-01-11T22:51:00.4766690Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:26 to store for rank: 0 2023-01-11T22:51:00.4767083Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:26 with 2 nodes. 2023-01-11T22:51:00.4767471Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:26 with 2 nodes. 2023-01-11T22:51:00.4767704Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:27 to store for rank: 1 2023-01-11T22:51:00.4767936Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:27 to store for rank: 0 2023-01-11T22:51:00.4768323Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:27 with 2 nodes. 2023-01-11T22:51:00.4768712Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:27 with 2 nodes. 2023-01-11T22:51:00.4768992Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:28 to store for rank: 1 2023-01-11T22:51:00.4769212Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:28 to store for rank: 0 2023-01-11T22:51:00.4769603Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:28 with 2 nodes. 2023-01-11T22:51:00.4769989Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:28 with 2 nodes. 2023-01-11T22:51:00.4770222Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:29 to store for rank: 1 2023-01-11T22:51:00.4770452Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:29 to store for rank: 0 2023-01-11T22:51:00.4770841Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:29 with 2 nodes. 2023-01-11T22:51:00.4771230Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:29 with 2 nodes. 2023-01-11T22:51:00.4771465Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:30 to store for rank: 1 2023-01-11T22:51:00.4771693Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:30 to store for rank: 0 2023-01-11T22:51:00.4772061Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:30 with 2 nodes. 2023-01-11T22:51:00.4772447Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:30 with 2 nodes. 2023-01-11T22:51:00.4772684Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:31 to store for rank: 1 2023-01-11T22:51:00.4772915Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:31 to store for rank: 0 2023-01-11T22:51:00.4773302Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:31 with 2 nodes. 2023-01-11T22:51:00.4773686Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:31 with 2 nodes. 2023-01-11T22:51:00.4773919Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:32 to store for rank: 1 2023-01-11T22:51:00.4774149Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:32 to store for rank: 0 2023-01-11T22:51:00.4774532Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:32 with 2 nodes. 2023-01-11T22:51:00.4774919Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:32 with 2 nodes. 2023-01-11T22:51:00.4775187Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:33 to store for rank: 1 2023-01-11T22:51:00.4775417Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:33 to store for rank: 0 2023-01-11T22:51:00.4775805Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:33 with 2 nodes. 2023-01-11T22:51:00.4776186Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:33 with 2 nodes. 2023-01-11T22:51:00.4777143Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4777891Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4778719Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4779473Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4780201Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4780936Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4781662Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4782383Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4783112Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4783834Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4784557Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4785360Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4786080Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4786797Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4787560Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4788293Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4789018Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4789744Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4790462Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4791183Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4791905Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4792664Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4793398Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4794215Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4794937Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4795650Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4796412Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4797141Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4797861Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4798583Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4799302Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4800017Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4800269Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:34 to store for rank: 1 2023-01-11T22:51:00.4800507Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:34 to store for rank: 0 2023-01-11T22:51:00.4800887Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:34 with 2 nodes. 2023-01-11T22:51:00.4801608Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4801997Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:34 with 2 nodes. 2023-01-11T22:51:00.4802772Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4803013Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:35 to store for rank: 1 2023-01-11T22:51:00.4803248Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:35 to store for rank: 0 2023-01-11T22:51:00.4803638Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:35 with 2 nodes. 2023-01-11T22:51:00.4804355Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4804785Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:35 with 2 nodes. 2023-01-11T22:51:00.4805511Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4805747Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:36 to store for rank: 1 2023-01-11T22:51:00.4805981Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:36 to store for rank: 0 2023-01-11T22:51:00.4806364Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:36 with 2 nodes. 2023-01-11T22:51:00.4807085Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4807454Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:36 with 2 nodes. 2023-01-11T22:51:00.4808164Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4808397Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:37 to store for rank: 0 2023-01-11T22:51:00.4808627Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:37 to store for rank: 1 2023-01-11T22:51:00.4809014Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:37 with 2 nodes. 2023-01-11T22:51:00.4809404Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:37 with 2 nodes. 2023-01-11T22:51:00.4810133Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4810858Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4811025Z dist init r=1, world=2 2023-01-11T22:51:00.4811358Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.4811675Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.4811981Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.4812285Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.4812568Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.4812874Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.4813227Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.4813534Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.4813835Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.4814134Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.4814434Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.4814738Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.4814849Z dist init r=0, world=2 2023-01-11T22:51:00.4815168Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.4815479Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.4815768Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.4816073Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.4816379Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.4816897Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.4817207Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.4817509Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.4817808Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.4818190Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.4818489Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.4818786Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.4818887Z ok (7.617s) 2023-01-11T22:51:00.4819201Z test_mixture_of_experts_offload_true_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 92003 2023-01-11T22:51:00.4819421Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 92004 2023-01-11T22:51:00.4819806Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.4819986Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.4820425Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.4820627Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.4820997Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.4821175Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.4821553Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.4821725Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.4821970Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.4822219Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.4822614Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.4823005Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.4823232Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.4823459Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.4824472Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.4824588Z warnings.warn( 2023-01-11T22:51:00.4825597Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.4825710Z warnings.warn( 2023-01-11T22:51:00.4825937Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:51:00.4826176Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:51:00.4826628Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:51:00.4827023Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:51:00.4827556Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2023-01-11T22:51:00.4827667Z warnings.warn( 2023-01-11T22:51:00.4828184Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2023-01-11T22:51:00.4828292Z warnings.warn( 2023-01-11T22:51:00.4829034Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4829814Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4829953Z File "", line 1, in 2023-01-11T22:51:00.4830148Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4830292Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4830496Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4830648Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4830862Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4830970Z self.run() 2023-01-11T22:51:00.4831173Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4831321Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4831649Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4831782Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4832142Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4832265Z getattr(self, test_name)() 2023-01-11T22:51:00.4832626Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4832723Z fn() 2023-01-11T22:51:00.4833088Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4833213Z test(self, **param_kwargs) 2023-01-11T22:51:00.4833554Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4833681Z return func(*args, **kwargs) 2023-01-11T22:51:00.4833926Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4834038Z self.run_subtests( 2023-01-11T22:51:00.4834389Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4834550Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4834907Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4835061Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4835415Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4835592Z output = model(*input) 2023-01-11T22:51:00.4835921Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4836062Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4836438Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4836610Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4836973Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4837094Z _lazy_init(state, module) 2023-01-11T22:51:00.4837427Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4837596Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4837990Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4838137Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4838527Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4838660Z return func(*args, **kwargs) 2023-01-11T22:51:00.4839044Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4839145Z p_assert( 2023-01-11T22:51:00.4839460Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4839588Z traceback.print_stack() 2023-01-11T22:51:00.4839717Z File "", line 1, in 2023-01-11T22:51:00.4839927Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4840070Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4840275Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4840425Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4840620Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4840724Z self.run() 2023-01-11T22:51:00.4840924Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4841070Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4841407Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4841539Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4841899Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4842023Z getattr(self, test_name)() 2023-01-11T22:51:00.4842362Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4842463Z fn() 2023-01-11T22:51:00.4842830Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4842951Z test(self, **param_kwargs) 2023-01-11T22:51:00.4843304Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4843428Z return func(*args, **kwargs) 2023-01-11T22:51:00.4843672Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4843786Z self.run_subtests( 2023-01-11T22:51:00.4844117Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4844276Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4844636Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4844849Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4845231Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4845350Z output = model(*input) 2023-01-11T22:51:00.4845674Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4845810Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4846164Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4846338Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4846700Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4846819Z _lazy_init(state, module) 2023-01-11T22:51:00.4847173Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4847339Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4847784Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4847938Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4848257Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4848385Z return func(*args, **kwargs) 2023-01-11T22:51:00.4848759Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4848861Z p_assert( 2023-01-11T22:51:00.4849193Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4849322Z traceback.print_stack() 2023-01-11T22:51:00.4849566Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:51:00.4849810Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:51:00.4850189Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:51:00.4850583Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:51:00.4851324Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4852061Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4852195Z File "", line 1, in 2023-01-11T22:51:00.4852406Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4852546Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4852749Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4852898Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4853109Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4853195Z self.run() 2023-01-11T22:51:00.4853397Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4853542Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4853941Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4854075Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4854441Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4854565Z getattr(self, test_name)() 2023-01-11T22:51:00.4854922Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4855009Z fn() 2023-01-11T22:51:00.4855373Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4855495Z test(self, **param_kwargs) 2023-01-11T22:51:00.4855851Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4855975Z return func(*args, **kwargs) 2023-01-11T22:51:00.4856230Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4856343Z self.run_subtests( 2023-01-11T22:51:00.4856995Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4857159Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4857536Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4857691Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4858064Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4858184Z output = model(*input) 2023-01-11T22:51:00.4858508Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4858645Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4859023Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4859184Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4859550Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4859671Z _lazy_init(state, module) 2023-01-11T22:51:00.4860020Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4860188Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4860583Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4860728Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4861065Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4861176Z return func(*args, **kwargs) 2023-01-11T22:51:00.4861554Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4861655Z p_assert( 2023-01-11T22:51:00.4861987Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4862114Z traceback.print_stack() 2023-01-11T22:51:00.4862243Z File "", line 1, in 2023-01-11T22:51:00.4862451Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4862593Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4862776Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4862927Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4863136Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4863313Z self.run() 2023-01-11T22:51:00.4863512Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4863662Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4864002Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4864117Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4864474Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4864600Z getattr(self, test_name)() 2023-01-11T22:51:00.4864956Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4865053Z fn() 2023-01-11T22:51:00.4865415Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4865542Z test(self, **param_kwargs) 2023-01-11T22:51:00.4865896Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4866049Z return func(*args, **kwargs) 2023-01-11T22:51:00.4866304Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4866419Z self.run_subtests( 2023-01-11T22:51:00.4866773Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4866935Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4867296Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4867451Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4867820Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4867927Z output = model(*input) 2023-01-11T22:51:00.4868250Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4868389Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4868760Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4868934Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4869296Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4869416Z _lazy_init(state, module) 2023-01-11T22:51:00.4869765Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4869914Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4870308Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4870456Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4870793Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4870917Z return func(*args, **kwargs) 2023-01-11T22:51:00.4871293Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4871395Z p_assert( 2023-01-11T22:51:00.4871729Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4871837Z traceback.print_stack() 2023-01-11T22:51:00.4872077Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:51:00.4872318Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:51:00.4872776Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:51:00.4873171Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:51:00.4873913Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4874649Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4874783Z File "", line 1, in 2023-01-11T22:51:00.4874993Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4875134Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4875365Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4875524Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4875737Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4875843Z self.run() 2023-01-11T22:51:00.4876051Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4876198Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4876540Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4876672Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4877013Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4877142Z getattr(self, test_name)() 2023-01-11T22:51:00.4877505Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4877602Z fn() 2023-01-11T22:51:00.4877731Z File "", line 1, in 2023-01-11T22:51:00.4878095Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4878216Z test(self, **param_kwargs) 2023-01-11T22:51:00.4878571Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4878676Z return func(*args, **kwargs) 2023-01-11T22:51:00.4878883Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4879024Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4879272Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4879387Z self.run_subtests( 2023-01-11T22:51:00.4879588Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4879736Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4880067Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4880226Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4880435Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4880537Z self.run() 2023-01-11T22:51:00.4880897Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4881048Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4881250Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4881453Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4881814Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4881938Z output = model(*input) 2023-01-11T22:51:00.4882274Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4882407Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4882733Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4882874Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4883232Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4883359Z getattr(self, test_name)() 2023-01-11T22:51:00.4883712Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4883888Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4884288Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4884396Z fn() 2023-01-11T22:51:00.4884763Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4884887Z _lazy_init(state, module) 2023-01-11T22:51:00.4885249Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4885372Z test(self, **param_kwargs) 2023-01-11T22:51:00.4885703Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4885873Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4886235Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4886360Z return func(*args, **kwargs) 2023-01-11T22:51:00.4886759Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4886902Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4887147Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4887262Z self.run_subtests( 2023-01-11T22:51:00.4887579Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4887704Z return func(*args, **kwargs) 2023-01-11T22:51:00.4888052Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4888213Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4888594Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4888695Z p_assert( 2023-01-11T22:51:00.4889059Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4889210Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4889528Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4889655Z traceback.print_stack() 2023-01-11T22:51:00.4890025Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4890143Z output = model(*input) 2023-01-11T22:51:00.4890464Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4890602Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4891037Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4891217Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4891565Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4891687Z _lazy_init(state, module) 2023-01-11T22:51:00.4892036Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4892202Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4892597Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4892789Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4893130Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4893258Z return func(*args, **kwargs) 2023-01-11T22:51:00.4893687Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4893779Z p_assert( 2023-01-11T22:51:00.4894117Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4894246Z traceback.print_stack() 2023-01-11T22:51:00.4894490Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2023-01-11T22:51:00.4894731Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2023-01-11T22:51:00.4895128Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:51:00.4895522Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:51:00.4896276Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4897245Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4897384Z File "", line 1, in 2023-01-11T22:51:00.4897577Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4897723Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4897935Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4898084Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4898298Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4898402Z self.run() 2023-01-11T22:51:00.4898600Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4898728Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4899073Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4899207Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4899569Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4899694Z getattr(self, test_name)() 2023-01-11T22:51:00.4900054Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4900241Z fn() 2023-01-11T22:51:00.4900614Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4900723Z test(self, **param_kwargs) 2023-01-11T22:51:00.4901081Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4901205Z return func(*args, **kwargs) 2023-01-11T22:51:00.4901449Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4901564Z self.run_subtests( 2023-01-11T22:51:00.4901913Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4902073Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4902432Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4902571Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4903014Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4903145Z output = model(*input) 2023-01-11T22:51:00.4903475Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4903618Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4903992Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4904166Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4904530Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4904633Z _lazy_init(state, module) 2023-01-11T22:51:00.4904986Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4905158Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4905557Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4905699Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4906037Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4906162Z return func(*args, **kwargs) 2023-01-11T22:51:00.4906537Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4906621Z p_assert( 2023-01-11T22:51:00.4906957Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4907082Z traceback.print_stack() 2023-01-11T22:51:00.4907216Z File "", line 1, in 2023-01-11T22:51:00.4907425Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4907566Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4907766Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4907916Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4908109Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4908213Z self.run() 2023-01-11T22:51:00.4908412Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4908559Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4908897Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4909028Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4909383Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4909566Z getattr(self, test_name)() 2023-01-11T22:51:00.4909910Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4910008Z fn() 2023-01-11T22:51:00.4910371Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4910493Z test(self, **param_kwargs) 2023-01-11T22:51:00.4910849Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4910973Z return func(*args, **kwargs) 2023-01-11T22:51:00.4911219Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4911315Z self.run_subtests( 2023-01-11T22:51:00.4911668Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4911831Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4912237Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4912401Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4912778Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4912898Z output = model(*input) 2023-01-11T22:51:00.4913223Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4913343Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4913714Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4913888Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4914258Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4914378Z _lazy_init(state, module) 2023-01-11T22:51:00.4914733Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4914899Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4915295Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4915439Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4915757Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4915884Z return func(*args, **kwargs) 2023-01-11T22:51:00.4916263Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4916367Z p_assert( 2023-01-11T22:51:00.4916702Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4916829Z traceback.print_stack() 2023-01-11T22:51:00.4917072Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2023-01-11T22:51:00.4917315Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2023-01-11T22:51:00.4917697Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:51:00.4918440Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4918629Z File "", line 1, in 2023-01-11T22:51:00.4918842Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4918985Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4919190Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4919342Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4919552Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4919638Z self.run() 2023-01-11T22:51:00.4919841Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4919985Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4920329Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4920461Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4920823Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4920952Z getattr(self, test_name)() 2023-01-11T22:51:00.4921355Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4921441Z fn() 2023-01-11T22:51:00.4921811Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4921935Z test(self, **param_kwargs) 2023-01-11T22:51:00.4922295Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4922421Z return func(*args, **kwargs) 2023-01-11T22:51:00.4922667Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4922782Z self.run_subtests( 2023-01-11T22:51:00.4923134Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4923281Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4923647Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4923798Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4924170Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4924288Z output = model(*input) 2023-01-11T22:51:00.4924611Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4924747Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4925120Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4925275Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4925646Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4925765Z _lazy_init(state, module) 2023-01-11T22:51:00.4926119Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4926285Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4926680Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4926821Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4927161Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4927268Z return func(*args, **kwargs) 2023-01-11T22:51:00.4927645Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4927804Z p_assert( 2023-01-11T22:51:00.4928143Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4928273Z traceback.print_stack() 2023-01-11T22:51:00.4928669Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:51:00.4929413Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4929543Z File "", line 1, in 2023-01-11T22:51:00.4929752Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4929876Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4930081Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4930231Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4930521Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4930633Z self.run() 2023-01-11T22:51:00.4930835Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4930981Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4931320Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4931437Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4931796Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4931920Z getattr(self, test_name)() 2023-01-11T22:51:00.4932277Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4932381Z fn() 2023-01-11T22:51:00.4932744Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4932869Z test(self, **param_kwargs) 2023-01-11T22:51:00.4933224Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4933332Z return func(*args, **kwargs) 2023-01-11T22:51:00.4933575Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4933687Z self.run_subtests( 2023-01-11T22:51:00.4934037Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4934198Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4934558Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4934711Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4935088Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4935190Z output = model(*input) 2023-01-11T22:51:00.4935512Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4935648Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4936022Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4936196Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4936772Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4936907Z _lazy_init(state, module) 2023-01-11T22:51:00.4937271Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4937501Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4937906Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4938052Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4938391Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4938517Z return func(*args, **kwargs) 2023-01-11T22:51:00.4938890Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4938993Z p_assert( 2023-01-11T22:51:00.4939325Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4939434Z traceback.print_stack() 2023-01-11T22:51:00.4939681Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2023-01-11T22:51:00.4939979Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2023-01-11T22:51:00.4940390Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:51:00.4940781Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:51:00.4941526Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4942262Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4942400Z File "", line 1, in 2023-01-11T22:51:00.4942609Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4942751Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4942936Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4943087Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4943297Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4943400Z self.run() 2023-01-11T22:51:00.4943600Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4943745Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4944084Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4944219Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4944564Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4944686Z getattr(self, test_name)() 2023-01-11T22:51:00.4945044Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4945142Z fn() 2023-01-11T22:51:00.4945502Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4945625Z test(self, **param_kwargs) 2023-01-11T22:51:00.4945982Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4946090Z return func(*args, **kwargs) 2023-01-11T22:51:00.4946336Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4946508Z self.run_subtests( 2023-01-11T22:51:00.4946864Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4947026Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4947385Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4947537Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4947907Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4948010Z output = model(*input) 2023-01-11T22:51:00.4948330Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4948466Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4948844Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4949017Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4949427Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4949556Z _lazy_init(state, module) 2023-01-11T22:51:00.4949913Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4950081Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4950461Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4950604Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4950939Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4951067Z return func(*args, **kwargs) 2023-01-11T22:51:00.4951443Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4951546Z p_assert( 2023-01-11T22:51:00.4951880Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4952005Z traceback.print_stack() 2023-01-11T22:51:00.4952117Z File "", line 1, in 2023-01-11T22:51:00.4952322Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4952462Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4952663Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4952811Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4953020Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4953127Z self.run() 2023-01-11T22:51:00.4953309Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4953454Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4953796Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4953929Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4954286Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4954408Z getattr(self, test_name)() 2023-01-11T22:51:00.4954766Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4954864Z fn() 2023-01-11T22:51:00.4955206Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4955329Z test(self, **param_kwargs) 2023-01-11T22:51:00.4955745Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4955872Z return func(*args, **kwargs) 2023-01-11T22:51:00.4956122Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4956234Z self.run_subtests( 2023-01-11T22:51:00.4956579Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4956742Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4957082Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4957236Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4957608Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4957730Z output = model(*input) 2023-01-11T22:51:00.4958053Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4958247Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4958633Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4958808Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4959153Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4959273Z _lazy_init(state, module) 2023-01-11T22:51:00.4959622Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4959788Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4960184Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4960328Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4960665Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4960790Z return func(*args, **kwargs) 2023-01-11T22:51:00.4961148Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4961253Z p_assert( 2023-01-11T22:51:00.4961586Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4961711Z traceback.print_stack() 2023-01-11T22:51:00.4961953Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2023-01-11T22:51:00.4962198Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2023-01-11T22:51:00.4962600Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:51:00.4962991Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:51:00.4963738Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4963853Z File "", line 1, in 2023-01-11T22:51:00.4964062Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4964202Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4964404Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4964552Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4964820Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4964926Z self.run() 2023-01-11T22:51:00.4965132Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4965263Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4965605Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4965737Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4966099Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4966222Z getattr(self, test_name)() 2023-01-11T22:51:00.4966582Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4966682Z fn() 2023-01-11T22:51:00.4967104Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4967218Z test(self, **param_kwargs) 2023-01-11T22:51:00.4967635Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4967768Z return func(*args, **kwargs) 2023-01-11T22:51:00.4968015Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4968131Z self.run_subtests( 2023-01-11T22:51:00.4968484Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4968644Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4969005Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4969141Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4969518Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4969637Z output = model(*input) 2023-01-11T22:51:00.4969964Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4970105Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4970479Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4970653Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4971015Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4971120Z _lazy_init(state, module) 2023-01-11T22:51:00.4971470Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4971638Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4972034Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4972177Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4972516Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4972639Z return func(*args, **kwargs) 2023-01-11T22:51:00.4973012Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4973097Z p_assert( 2023-01-11T22:51:00.4973431Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4973556Z traceback.print_stack() 2023-01-11T22:51:00.4974302Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4974494Z File "", line 1, in 2023-01-11T22:51:00.4974705Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4974848Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4975049Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4975181Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4975394Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4975498Z self.run() 2023-01-11T22:51:00.4975699Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4975845Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4976190Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4976321Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4976974Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4977098Z getattr(self, test_name)() 2023-01-11T22:51:00.4977468Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4977569Z fn() 2023-01-11T22:51:00.4977933Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4978056Z test(self, **param_kwargs) 2023-01-11T22:51:00.4978411Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4978535Z return func(*args, **kwargs) 2023-01-11T22:51:00.4978780Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4978881Z self.run_subtests( 2023-01-11T22:51:00.4979233Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4979393Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4979755Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4979908Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4980281Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4980400Z output = model(*input) 2023-01-11T22:51:00.4980720Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4980841Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4981219Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4981390Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4981759Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4981879Z _lazy_init(state, module) 2023-01-11T22:51:00.4982232Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4982397Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4982791Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4982916Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4983251Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4983451Z return func(*args, **kwargs) 2023-01-11T22:51:00.4983838Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4983942Z p_assert( 2023-01-11T22:51:00.4984277Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4984403Z traceback.print_stack() 2023-01-11T22:51:00.4984645Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2023-01-11T22:51:00.4984871Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2023-01-11T22:51:00.4985270Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:51:00.4985661Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:51:00.4986451Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4987199Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.4987335Z File "", line 1, in 2023-01-11T22:51:00.4987548Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4987690Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4987892Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4988046Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4988239Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4988347Z self.run() 2023-01-11T22:51:00.4988548Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4988695Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4989036Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4989168Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4989528Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4989650Z getattr(self, test_name)() 2023-01-11T22:51:00.4989989Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4990090Z fn() 2023-01-11T22:51:00.4990453Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.4990576Z test(self, **param_kwargs) 2023-01-11T22:51:00.4990935Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.4991060Z return func(*args, **kwargs) 2023-01-11T22:51:00.4991304Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.4991418Z self.run_subtests( 2023-01-11T22:51:00.4991750Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.4991911Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.4992271Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.4992490Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.4992912Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.4993041Z output = model(*input) 2023-01-11T22:51:00.4993368Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.4993508Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.4993619Z File "", line 1, in 2023-01-11T22:51:00.4993992Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.4994164Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.4994530Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.4994651Z _lazy_init(state, module) 2023-01-11T22:51:00.4994861Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.4995001Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.4995405Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.4995562Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.4995766Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.4995918Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.4996318Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.4996461Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.4996672Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.4996780Z self.run() 2023-01-11T22:51:00.4997119Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.4997230Z return func(*args, **kwargs) 2023-01-11T22:51:00.4997434Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.4997580Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.4997956Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.4998058Z p_assert( 2023-01-11T22:51:00.4998392Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.4998524Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.4998839Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.4998965Z traceback.print_stack() 2023-01-11T22:51:00.4999324Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.4999451Z getattr(self, test_name)() 2023-01-11T22:51:00.4999813Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.4999910Z fn() 2023-01-11T22:51:00.5000271Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5000393Z test(self, **param_kwargs) 2023-01-11T22:51:00.5000731Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5000855Z return func(*args, **kwargs) 2023-01-11T22:51:00.5001098Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.5001211Z self.run_subtests( 2023-01-11T22:51:00.5001562Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5001783Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5002149Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5002300Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5002655Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5002774Z output = model(*input) 2023-01-11T22:51:00.5003098Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5003234Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5003608Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5003781Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5004148Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5004270Z _lazy_init(state, module) 2023-01-11T22:51:00.5004653Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5004828Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5005230Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5005373Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5005712Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5005837Z return func(*args, **kwargs) 2023-01-11T22:51:00.5006211Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5006320Z p_assert( 2023-01-11T22:51:00.5006636Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5006765Z traceback.print_stack() 2023-01-11T22:51:00.5007073Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2023-01-11T22:51:00.5007510Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2023-01-11T22:51:00.5008272Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:51:00.5009044Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:51:00.5010353Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5011127Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5012092Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5012860Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5013680Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5014412Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5015138Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5015912Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5017347Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5018114Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5018252Z File "", line 1, in 2023-01-11T22:51:00.5018471Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5018598Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5018801Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5018954Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5019166Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5019269Z self.run() 2023-01-11T22:51:00.5019472Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5019617Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5019944Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5020082Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5020442Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5020569Z getattr(self, test_name)() 2023-01-11T22:51:00.5020928Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5021029Z fn() 2023-01-11T22:51:00.5021390Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5021511Z test(self, **param_kwargs) 2023-01-11T22:51:00.5021847Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5021974Z return func(*args, **kwargs) 2023-01-11T22:51:00.5022220Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.5022436Z self.run_subtests( 2023-01-11T22:51:00.5022796Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5022964Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5023328Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5023480Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5023835Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5023954Z output = model(*input) 2023-01-11T22:51:00.5024277Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5024414Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5024789Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5024965Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5025396Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5025528Z _lazy_init(state, module) 2023-01-11T22:51:00.5025865Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5026033Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5026429Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5026572Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5026908Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5027035Z return func(*args, **kwargs) 2023-01-11T22:51:00.5027415Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5027516Z p_assert( 2023-01-11T22:51:00.5027855Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5027965Z traceback.print_stack() 2023-01-11T22:51:00.5028095Z File "", line 1, in 2023-01-11T22:51:00.5028303Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5028443Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5028644Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5028794Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5029004Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5029090Z self.run() 2023-01-11T22:51:00.5029291Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5029439Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5029783Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5029916Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5030281Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5030407Z getattr(self, test_name)() 2023-01-11T22:51:00.5030763Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5030843Z fn() 2023-01-11T22:51:00.5031204Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5031325Z test(self, **param_kwargs) 2023-01-11T22:51:00.5031679Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5031861Z return func(*args, **kwargs) 2023-01-11T22:51:00.5032107Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.5032226Z self.run_subtests( 2023-01-11T22:51:00.5032580Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5032722Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5033081Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5033235Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5033608Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5033728Z output = model(*input) 2023-01-11T22:51:00.5034051Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5034193Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5034627Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5034790Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5035158Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5035277Z _lazy_init(state, module) 2023-01-11T22:51:00.5035630Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5035800Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5036195Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5036341Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5036677Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5036784Z return func(*args, **kwargs) 2023-01-11T22:51:00.5037161Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5037264Z p_assert( 2023-01-11T22:51:00.5037596Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5037723Z traceback.print_stack() 2023-01-11T22:51:00.5037968Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2023-01-11T22:51:00.5038210Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2023-01-11T22:51:00.5038611Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:51:00.5038992Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:51:00.5039739Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5040477Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5040607Z File "", line 1, in 2023-01-11T22:51:00.5040815Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5041024Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5041155Z File "", line 1, in 2023-01-11T22:51:00.5041361Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5041512Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5041723Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5041810Z self.run() 2023-01-11T22:51:00.5042017Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5042157Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5042359Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5042504Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5042702Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5042851Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5043183Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5043318Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5043577Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5043686Z self.run() 2023-01-11T22:51:00.5044048Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5044172Z getattr(self, test_name)() 2023-01-11T22:51:00.5044375Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5044522Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5044862Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5044960Z fn() 2023-01-11T22:51:00.5045293Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5045429Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5045795Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5045919Z test(self, **param_kwargs) 2023-01-11T22:51:00.5046271Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5046392Z getattr(self, test_name)() 2023-01-11T22:51:00.5046732Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5046856Z return func(*args, **kwargs) 2023-01-11T22:51:00.5047210Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5047306Z fn() 2023-01-11T22:51:00.5047551Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.5047668Z self.run_subtests( 2023-01-11T22:51:00.5048034Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5048155Z test(self, **param_kwargs) 2023-01-11T22:51:00.5048488Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5048650Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5049003Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5049128Z return func(*args, **kwargs) 2023-01-11T22:51:00.5049485Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5049636Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5049879Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.5050050Z self.run_subtests( 2023-01-11T22:51:00.5050414Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5050534Z output = model(*input) 2023-01-11T22:51:00.5050882Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5051044Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5051369Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5051506Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5051864Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5052016Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5052376Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5052595Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5052976Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5053101Z output = model(*input) 2023-01-11T22:51:00.5053463Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5053584Z _lazy_init(state, module) 2023-01-11T22:51:00.5053906Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5054042Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5054374Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5054546Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5054916Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5055090Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5055488Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5055629Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5055988Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5056109Z _lazy_init(state, module) 2023-01-11T22:51:00.5056427Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5057081Z return func(*args, **kwargs) 2023-01-11T22:51:00.5057581Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5057755Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5058141Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5058248Z p_assert( 2023-01-11T22:51:00.5058647Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5058788Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5059105Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5059236Z traceback.print_stack() 2023-01-11T22:51:00.5059572Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5059697Z return func(*args, **kwargs) 2023-01-11T22:51:00.5060173Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5060277Z p_assert( 2023-01-11T22:51:00.5060615Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5060742Z traceback.print_stack() 2023-01-11T22:51:00.5060968Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2023-01-11T22:51:00.5061207Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2023-01-11T22:51:00.5061605Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:51:00.5062352Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5062486Z File "", line 1, in 2023-01-11T22:51:00.5062757Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5062910Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5063114Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5063263Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5063457Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5063563Z self.run() 2023-01-11T22:51:00.5063764Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5063909Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5064253Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5064393Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5064752Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5064881Z getattr(self, test_name)() 2023-01-11T22:51:00.5065223Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5065321Z fn() 2023-01-11T22:51:00.5065681Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5065803Z test(self, **param_kwargs) 2023-01-11T22:51:00.5066156Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5066280Z return func(*args, **kwargs) 2023-01-11T22:51:00.5066524Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.5066623Z self.run_subtests( 2023-01-11T22:51:00.5066973Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5067133Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5067495Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5067647Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5068020Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5068138Z output = model(*input) 2023-01-11T22:51:00.5068460Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5068596Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5068952Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5069186Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5069560Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5069682Z _lazy_init(state, module) 2023-01-11T22:51:00.5070035Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5070203Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5070600Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5070742Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5071059Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5071184Z return func(*args, **kwargs) 2023-01-11T22:51:00.5071563Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5071667Z p_assert( 2023-01-11T22:51:00.5072045Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5072178Z traceback.print_stack() 2023-01-11T22:51:00.5072576Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:51:00.5073317Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5073447Z File "", line 1, in 2023-01-11T22:51:00.5073638Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5073784Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5073984Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5074137Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5074350Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5074453Z self.run() 2023-01-11T22:51:00.5074653Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5074782Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5075122Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5075254Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5075612Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5075735Z getattr(self, test_name)() 2023-01-11T22:51:00.5076096Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5076195Z fn() 2023-01-11T22:51:00.5076565Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5076670Z test(self, **param_kwargs) 2023-01-11T22:51:00.5077023Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5077148Z return func(*args, **kwargs) 2023-01-11T22:51:00.5077393Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.5077507Z self.run_subtests( 2023-01-11T22:51:00.5077855Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5078014Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5078463Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5078597Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5078974Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5079096Z output = model(*input) 2023-01-11T22:51:00.5079423Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5079561Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5079938Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5080109Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5080474Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5080579Z _lazy_init(state, module) 2023-01-11T22:51:00.5080927Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5081137Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5081542Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5081687Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5082024Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5082152Z return func(*args, **kwargs) 2023-01-11T22:51:00.5082525Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5082609Z p_assert( 2023-01-11T22:51:00.5082942Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5083071Z traceback.print_stack() 2023-01-11T22:51:00.5083315Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2023-01-11T22:51:00.5083558Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2023-01-11T22:51:00.5083954Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:51:00.5084699Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5084829Z File "", line 1, in 2023-01-11T22:51:00.5085039Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5085166Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5085367Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5085520Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5085734Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5085838Z self.run() 2023-01-11T22:51:00.5086038Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5086182Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5086522Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5086639Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5086999Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5087127Z getattr(self, test_name)() 2023-01-11T22:51:00.5087544Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5087643Z fn() 2023-01-11T22:51:00.5088011Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5088134Z test(self, **param_kwargs) 2023-01-11T22:51:00.5088488Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5088594Z return func(*args, **kwargs) 2023-01-11T22:51:00.5088838Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.5088951Z self.run_subtests( 2023-01-11T22:51:00.5089301Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5089462Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5089824Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5089977Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5090392Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5090503Z output = model(*input) 2023-01-11T22:51:00.5090830Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5090967Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5091341Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5091514Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5091879Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5092003Z _lazy_init(state, module) 2023-01-11T22:51:00.5092355Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5092509Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5092958Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5093102Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5093446Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5093569Z return func(*args, **kwargs) 2023-01-11T22:51:00.5093946Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5094048Z p_assert( 2023-01-11T22:51:00.5094384Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5094497Z traceback.print_stack() 2023-01-11T22:51:00.5094897Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:51:00.5095638Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5095769Z File "", line 1, in 2023-01-11T22:51:00.5095976Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5096117Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5096318Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5096467Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5097392Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5097486Z self.run() 2023-01-11T22:51:00.5097693Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5097843Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5098198Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5098332Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5098693Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5098817Z getattr(self, test_name)() 2023-01-11T22:51:00.5099156Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5099259Z fn() 2023-01-11T22:51:00.5099623Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5099750Z test(self, **param_kwargs) 2023-01-11T22:51:00.5100198Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5100335Z return func(*args, **kwargs) 2023-01-11T22:51:00.5100586Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:51:00.5100701Z self.run_subtests( 2023-01-11T22:51:00.5101040Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5101201Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5101559Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5101712Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5102085Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5102209Z output = model(*input) 2023-01-11T22:51:00.5102535Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5102672Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5103030Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5103206Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5103570Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5103691Z _lazy_init(state, module) 2023-01-11T22:51:00.5104043Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5104209Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5104609Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5104755Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5105091Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5105198Z return func(*args, **kwargs) 2023-01-11T22:51:00.5105570Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5105672Z p_assert( 2023-01-11T22:51:00.5106006Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5106132Z traceback.print_stack() 2023-01-11T22:51:00.5106374Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2023-01-11T22:51:00.5106690Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2023-01-11T22:51:00.5107097Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:51:00.5107828Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5108217Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:51:00.5108941Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5109184Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2023-01-11T22:51:00.5109459Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2023-01-11T22:51:00.5109857Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:51:00.5110248Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:51:00.5110490Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2023-01-11T22:51:00.5110725Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2023-01-11T22:51:00.5111112Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:51:00.5111507Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:51:00.5111730Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2023-01-11T22:51:00.5111960Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2023-01-11T22:51:00.5112347Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:51:00.5112737Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:51:00.5112972Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2023-01-11T22:51:00.5113202Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2023-01-11T22:51:00.5113591Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:51:00.5113979Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:51:00.5114213Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2023-01-11T22:51:00.5114424Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2023-01-11T22:51:00.5114809Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:51:00.5115192Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:51:00.5115426Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 1 2023-01-11T22:51:00.5115658Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 0 2023-01-11T22:51:00.5116108Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:51:00.5116498Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:51:00.5117242Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5117484Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 1 2023-01-11T22:51:00.5117718Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 0 2023-01-11T22:51:00.5118092Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:51:00.5118523Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:51:00.5119271Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5120005Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5120746Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5121480Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5122205Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5122930Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5123659Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5124385Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5125108Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5125883Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5126607Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5127371Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5128104Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5128819Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5129542Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5130268Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5130987Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5131707Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5132426Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5133143Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5133862Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5134635Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5135355Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5136118Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5137588Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5137853Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 1 2023-01-11T22:51:00.5138094Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 0 2023-01-11T22:51:00.5138505Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:51:00.5138905Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:51:00.5139642Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5140365Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5140604Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 1 2023-01-11T22:51:00.5140818Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 0 2023-01-11T22:51:00.5141214Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:51:00.5141605Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:51:00.5141847Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 1 2023-01-11T22:51:00.5142084Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 0 2023-01-11T22:51:00.5142475Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:51:00.5142864Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:51:00.5143098Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 1 2023-01-11T22:51:00.5143429Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 0 2023-01-11T22:51:00.5143825Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:51:00.5144195Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:51:00.5144432Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:26 to store for rank: 1 2023-01-11T22:51:00.5144663Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:26 to store for rank: 0 2023-01-11T22:51:00.5145050Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:26 with 2 nodes. 2023-01-11T22:51:00.5145436Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:26 with 2 nodes. 2023-01-11T22:51:00.5145675Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:27 to store for rank: 1 2023-01-11T22:51:00.5145968Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:27 to store for rank: 0 2023-01-11T22:51:00.5146365Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:27 with 2 nodes. 2023-01-11T22:51:00.5146750Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:27 with 2 nodes. 2023-01-11T22:51:00.5146968Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:28 to store for rank: 0 2023-01-11T22:51:00.5147200Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:28 to store for rank: 1 2023-01-11T22:51:00.5147589Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:28 with 2 nodes. 2023-01-11T22:51:00.5147981Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:28 with 2 nodes. 2023-01-11T22:51:00.5148216Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:29 to store for rank: 0 2023-01-11T22:51:00.5148449Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:29 to store for rank: 1 2023-01-11T22:51:00.5148837Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:29 with 2 nodes. 2023-01-11T22:51:00.5149219Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:29 with 2 nodes. 2023-01-11T22:51:00.5149452Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:30 to store for rank: 0 2023-01-11T22:51:00.5149666Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:30 to store for rank: 1 2023-01-11T22:51:00.5150053Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:30 with 2 nodes. 2023-01-11T22:51:00.5150444Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:30 with 2 nodes. 2023-01-11T22:51:00.5150682Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:31 to store for rank: 1 2023-01-11T22:51:00.5150913Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:31 to store for rank: 0 2023-01-11T22:51:00.5151299Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:31 with 2 nodes. 2023-01-11T22:51:00.5151680Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:31 with 2 nodes. 2023-01-11T22:51:00.5151915Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:32 to store for rank: 0 2023-01-11T22:51:00.5152146Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:32 to store for rank: 1 2023-01-11T22:51:00.5152588Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:32 with 2 nodes. 2023-01-11T22:51:00.5152957Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:32 with 2 nodes. 2023-01-11T22:51:00.5153193Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:33 to store for rank: 1 2023-01-11T22:51:00.5153422Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:33 to store for rank: 0 2023-01-11T22:51:00.5153807Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:33 with 2 nodes. 2023-01-11T22:51:00.5154188Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:33 with 2 nodes. 2023-01-11T22:51:00.5154923Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5155699Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5156444Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5157177Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5157907Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5158628Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5159351Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5160082Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5160803Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5161523Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5162304Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5163027Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5163752Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5164533Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5165265Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5165987Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5166717Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5167437Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5168157Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5168882Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5169604Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5170321Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5171100Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5171823Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5172543Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5173320Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5174052Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5174769Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5175498Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5176215Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5177746Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5178497Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5178742Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:34 to store for rank: 0 2023-01-11T22:51:00.5178962Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:34 to store for rank: 1 2023-01-11T22:51:00.5179362Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:34 with 2 nodes. 2023-01-11T22:51:00.5179753Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:34 with 2 nodes. 2023-01-11T22:51:00.5180586Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5181310Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5181551Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:35 to store for rank: 1 2023-01-11T22:51:00.5181785Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:35 to store for rank: 0 2023-01-11T22:51:00.5182180Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:35 with 2 nodes. 2023-01-11T22:51:00.5182628Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:35 with 2 nodes. 2023-01-11T22:51:00.5183367Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5184090Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5184329Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:36 to store for rank: 0 2023-01-11T22:51:00.5184565Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:36 to store for rank: 1 2023-01-11T22:51:00.5184941Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:36 with 2 nodes. 2023-01-11T22:51:00.5185329Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:36 with 2 nodes. 2023-01-11T22:51:00.5186055Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5186771Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5187013Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:37 to store for rank: 1 2023-01-11T22:51:00.5187244Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:37 to store for rank: 0 2023-01-11T22:51:00.5187635Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:37 with 2 nodes. 2023-01-11T22:51:00.5188020Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:37 with 2 nodes. 2023-01-11T22:51:00.5188746Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5189527Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5189641Z dist init r=0, world=2 2023-01-11T22:51:00.5189968Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.5190281Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.5190570Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.5190877Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.5191220Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.5191527Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.5191827Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.5192126Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.5192423Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.5192777Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.5193082Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.5193384Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.5193498Z dist init r=1, world=2 2023-01-11T22:51:00.5193823Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.5194117Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.5194429Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.5194729Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.5195028Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.5195327Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.5195624Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.5195986Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.5196285Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.5196584Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.5196881Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.5197179Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.5197266Z ok (7.316s) 2023-01-11T22:51:00.5197662Z test_mixture_of_experts_with_delay_before_free_offload_false_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 92350 2023-01-11T22:51:00.5197889Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 92351 2023-01-11T22:51:00.5198273Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.5198449Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.5198832Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.5199023Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.5199384Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.5199562Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.5199919Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.5200110Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.5200354Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.5200594Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.5200989Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.5201381Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.5201608Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.5201837Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.5202852Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.5202965Z warnings.warn( 2023-01-11T22:51:00.5203187Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:51:00.5204184Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.5204353Z warnings.warn( 2023-01-11T22:51:00.5204592Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:51:00.5204985Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:51:00.5205377Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:51:00.5206113Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5206894Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5207145Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:51:00.5207385Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:51:00.5207778Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:51:00.5208165Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:51:00.5208402Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:51:00.5208627Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:51:00.5209015Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:51:00.5209397Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:51:00.5209633Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2023-01-11T22:51:00.5209867Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2023-01-11T22:51:00.5210248Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:51:00.5210629Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:51:00.5210869Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2023-01-11T22:51:00.5211108Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2023-01-11T22:51:00.5211471Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:51:00.5211852Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:51:00.5212084Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2023-01-11T22:51:00.5212471Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:51:00.5212708Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2023-01-11T22:51:00.5213088Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:51:00.5213383Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2023-01-11T22:51:00.5213778Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:51:00.5214016Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2023-01-11T22:51:00.5214382Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:51:00.5214618Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2023-01-11T22:51:00.5214852Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2023-01-11T22:51:00.5215235Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:51:00.5215622Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:51:00.5216420Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5217874Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5218617Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5219360Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5220088Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5220813Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5221546Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5222271Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5222988Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5223809Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5224530Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5225251Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5226042Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5226849Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5227723Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5228583Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5229404Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5230241Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5231045Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5231769Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5232489Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5233269Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5233994Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5234711Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5235469Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5236198Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5236918Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5237643Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5238364Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5239080Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5239807Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5240527Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5241245Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5242260Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5243002Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5243723Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5244501Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5245231Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5245949Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5246675Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5246922Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2023-01-11T22:51:00.5247162Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2023-01-11T22:51:00.5247562Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:51:00.5247956Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:51:00.5248183Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2023-01-11T22:51:00.5248425Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2023-01-11T22:51:00.5248821Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:51:00.5249217Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:51:00.5249458Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2023-01-11T22:51:00.5249694Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2023-01-11T22:51:00.5250083Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:51:00.5250523Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:51:00.5250764Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2023-01-11T22:51:00.5250996Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2023-01-11T22:51:00.5251365Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:51:00.5251753Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:51:00.5251992Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2023-01-11T22:51:00.5252224Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2023-01-11T22:51:00.5252613Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:51:00.5253038Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:51:00.5253278Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2023-01-11T22:51:00.5253511Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2023-01-11T22:51:00.5253899Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:51:00.5254264Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:51:00.5254498Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2023-01-11T22:51:00.5254729Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2023-01-11T22:51:00.5255120Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:51:00.5255511Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:51:00.5255746Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2023-01-11T22:51:00.5255976Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2023-01-11T22:51:00.5256362Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:51:00.5257399Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:51:00.5257631Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2023-01-11T22:51:00.5257867Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2023-01-11T22:51:00.5258268Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:51:00.5258654Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:51:00.5258887Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2023-01-11T22:51:00.5259118Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2023-01-11T22:51:00.5259503Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:51:00.5259889Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:51:00.5260214Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 0 2023-01-11T22:51:00.5260449Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 1 2023-01-11T22:51:00.5260822Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:51:00.5261207Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:51:00.5261443Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 0 2023-01-11T22:51:00.5261673Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 1 2023-01-11T22:51:00.5262058Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:51:00.5262446Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:51:00.5263237Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5263985Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5264720Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5265456Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5266186Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5266914Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5267704Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5268438Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5269167Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5269954Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5270676Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5271401Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5272163Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5272895Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5273614Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5274345Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5275066Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5275788Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5276511Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5277232Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5277952Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5278736Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5279458Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5280176Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5280937Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5281670Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5282390Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5283122Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5283846Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5284568Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5285294Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5286016Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5286737Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5287513Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5288215Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5288937Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5289718Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5290452Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5291172Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5291895Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5292615Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5293396Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5294129Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5294852Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5295571Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5296353Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5297689Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5298430Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5299232Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5300138Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5300872Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5301606Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5302330Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5303048Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5303773Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5304496Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5305216Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5306017Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5306736Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5307460Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5308216Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5308946Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5309195Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 1 2023-01-11T22:51:00.5309434Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 0 2023-01-11T22:51:00.5309836Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:51:00.5310233Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:51:00.5310956Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5311678Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5311917Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 0 2023-01-11T22:51:00.5312153Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 1 2023-01-11T22:51:00.5312549Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:51:00.5313252Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5313639Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:51:00.5313883Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 0 2023-01-11T22:51:00.5314324Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:51:00.5314572Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 1 2023-01-11T22:51:00.5314962Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:51:00.5315694Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5315930Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 1 2023-01-11T22:51:00.5316163Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 0 2023-01-11T22:51:00.5316556Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:51:00.5316988Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:51:00.5317715Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5317829Z dist init r=1, world=2 2023-01-11T22:51:00.5317939Z dist init r=0, world=2 2023-01-11T22:51:00.5318041Z ok (26.945s) 2023-01-11T22:51:00.5318395Z test_mixture_of_experts_with_delay_before_free_offload_false_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 92673 2023-01-11T22:51:00.5318615Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 92674 2023-01-11T22:51:00.5318990Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.5319170Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.5319550Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.5319722Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.5320089Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.5320263Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.5320639Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.5320827Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.5321074Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.5321318Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.5321711Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.5322099Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.5322309Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.5322533Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.5323551Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.5323714Z warnings.warn( 2023-01-11T22:51:00.5323956Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:51:00.5324952Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.5325062Z warnings.warn( 2023-01-11T22:51:00.5325305Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:51:00.5325735Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:51:00.5326272Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T22:51:00.5326385Z warnings.warn( 2023-01-11T22:51:00.5326756Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:51:00.5327276Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T22:51:00.5327389Z warnings.warn( 2023-01-11T22:51:00.5328134Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5328861Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5329103Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:51:00.5329341Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:51:00.5329735Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:51:00.5330133Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:51:00.5330374Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:51:00.5330613Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:51:00.5330977Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:51:00.5331356Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:51:00.5331593Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2023-01-11T22:51:00.5331984Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:51:00.5332266Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2023-01-11T22:51:00.5332659Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:51:00.5332895Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2023-01-11T22:51:00.5333285Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:51:00.5333523Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2023-01-11T22:51:00.5333888Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:51:00.5334123Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2023-01-11T22:51:00.5334513Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:51:00.5334749Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2023-01-11T22:51:00.5335167Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:51:00.5335406Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2023-01-11T22:51:00.5335796Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:51:00.5336028Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2023-01-11T22:51:00.5336411Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:51:00.5337241Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2023-01-11T22:51:00.5337662Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:51:00.5337901Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2023-01-11T22:51:00.5338291Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:51:00.5339034Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5339770Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5340508Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5341236Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5341965Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5342791Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5343522Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5344245Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5345023Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5345758Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5346479Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5347210Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5347933Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5348659Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5349388Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5350106Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5350815Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5351605Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5352328Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5353050Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5353820Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5354547Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5355268Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5355997Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5356242Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2023-01-11T22:51:00.5356480Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2023-01-11T22:51:00.5356877Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:51:00.5357275Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:51:00.5357501Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2023-01-11T22:51:00.5357898Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:51:00.5358140Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2023-01-11T22:51:00.5358531Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:51:00.5358767Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2023-01-11T22:51:00.5359153Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:51:00.5359394Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2023-01-11T22:51:00.5359782Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:51:00.5360099Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2023-01-11T22:51:00.5360317Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2023-01-11T22:51:00.5360709Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:51:00.5361097Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:51:00.5361334Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2023-01-11T22:51:00.5361566Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2023-01-11T22:51:00.5361954Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:51:00.5362344Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:51:00.5362621Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2023-01-11T22:51:00.5363015Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:51:00.5363260Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2023-01-11T22:51:00.5363630Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:51:00.5363866Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2023-01-11T22:51:00.5364253Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:51:00.5364498Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2023-01-11T22:51:00.5364887Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:51:00.5365123Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2023-01-11T22:51:00.5365355Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2023-01-11T22:51:00.5365740Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:51:00.5366125Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:51:00.5366343Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2023-01-11T22:51:00.5366579Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2023-01-11T22:51:00.5366967Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:51:00.5367350Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:51:00.5367590Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2023-01-11T22:51:00.5367822Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2023-01-11T22:51:00.5368211Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:51:00.5368594Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:51:00.5368881Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 1 2023-01-11T22:51:00.5369251Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:51:00.5369486Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 0 2023-01-11T22:51:00.5369871Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:51:00.5370108Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 1 2023-01-11T22:51:00.5370500Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:51:00.5370739Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 0 2023-01-11T22:51:00.5371125Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:51:00.5372191Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py:1341: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:00.5372399Z _ext_post_unflatten_transform(subtensor.view(shape), param_extension) 2023-01-11T22:51:00.5373399Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py:1341: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:00.5373603Z _ext_post_unflatten_transform(subtensor.view(shape), param_extension) 2023-01-11T22:51:00.5373848Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 1 2023-01-11T22:51:00.5374065Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 0 2023-01-11T22:51:00.5374458Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:51:00.5374847Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:51:00.5375584Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5375828Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 1 2023-01-11T22:51:00.5376066Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 0 2023-01-11T22:51:00.5376452Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:51:00.5377595Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:51:00.5377840Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 1 2023-01-11T22:51:00.5378073Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 0 2023-01-11T22:51:00.5378473Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:51:00.5378945Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:51:00.5379183Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 1 2023-01-11T22:51:00.5379571Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:51:00.5379809Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 0 2023-01-11T22:51:00.5380197Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:51:00.5380313Z dist init r=1, world=2 2023-01-11T22:51:00.5380422Z dist init r=0, world=2 2023-01-11T22:51:00.5380524Z ok (29.549s) 2023-01-11T22:51:00.5380870Z test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 92996 2023-01-11T22:51:00.5381155Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 92997 2023-01-11T22:51:00.5381540Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.5381715Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.5382093Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.5382284Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.5382648Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.5382822Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.5383193Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.5383369Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.5383616Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.5383862Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.5384257Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.5384649Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.5384878Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.5385106Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.5386128Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.5386243Z warnings.warn( 2023-01-11T22:51:00.5387247Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.5387412Z warnings.warn( 2023-01-11T22:51:00.5387636Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:51:00.5387879Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:51:00.5388274Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:51:00.5388810Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2023-01-11T22:51:00.5388922Z warnings.warn( 2023-01-11T22:51:00.5389664Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5390062Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:51:00.5390635Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2023-01-11T22:51:00.5390751Z warnings.warn( 2023-01-11T22:51:00.5391490Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5391729Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:51:00.5391948Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:51:00.5392343Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:51:00.5392733Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:51:00.5393021Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:51:00.5393263Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:51:00.5393650Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:51:00.5394032Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:51:00.5394270Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2023-01-11T22:51:00.5394508Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2023-01-11T22:51:00.5394890Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:51:00.5395256Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:51:00.5395489Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2023-01-11T22:51:00.5395726Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2023-01-11T22:51:00.5396110Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:51:00.5396491Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:51:00.5396784Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2023-01-11T22:51:00.5397019Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2023-01-11T22:51:00.5397415Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:51:00.5397804Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:51:00.5398022Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2023-01-11T22:51:00.5398258Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2023-01-11T22:51:00.5398645Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:51:00.5399031Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:51:00.5399272Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2023-01-11T22:51:00.5399551Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2023-01-11T22:51:00.5399943Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:51:00.5400327Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:51:00.5401071Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5401810Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5402547Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5403280Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5404012Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5404742Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5405459Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5406185Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5406968Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5407694Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5408455Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5409195Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5409923Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5410647Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5411370Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5412075Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5412799Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5413524Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5414244Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5414968Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5415741Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5416462Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5417722Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5418467Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5418713Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2023-01-11T22:51:00.5418950Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2023-01-11T22:51:00.5419350Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:51:00.5419750Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:51:00.5419992Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2023-01-11T22:51:00.5420225Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2023-01-11T22:51:00.5420618Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:51:00.5421007Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:51:00.5421227Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2023-01-11T22:51:00.5421457Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2023-01-11T22:51:00.5421844Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:51:00.5422244Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:51:00.5422483Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2023-01-11T22:51:00.5422715Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2023-01-11T22:51:00.5423102Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:51:00.5423491Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:51:00.5423727Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2023-01-11T22:51:00.5423958Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2023-01-11T22:51:00.5424406Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:51:00.5424797Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:51:00.5425033Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2023-01-11T22:51:00.5425262Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2023-01-11T22:51:00.5425647Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:51:00.5426031Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:51:00.5426264Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2023-01-11T22:51:00.5426497Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2023-01-11T22:51:00.5426928Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:51:00.5427305Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:51:00.5427540Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2023-01-11T22:51:00.5427767Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2023-01-11T22:51:00.5428153Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:51:00.5428542Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:51:00.5428777Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2023-01-11T22:51:00.5429010Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2023-01-11T22:51:00.5429395Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:51:00.5429780Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:51:00.5430015Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2023-01-11T22:51:00.5430229Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2023-01-11T22:51:00.5430617Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:51:00.5431009Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:51:00.5431248Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 0 2023-01-11T22:51:00.5431478Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 1 2023-01-11T22:51:00.5431865Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:51:00.5432253Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:51:00.5432487Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 0 2023-01-11T22:51:00.5432717Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 1 2023-01-11T22:51:00.5433085Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:51:00.5433530Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:51:00.5434551Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py:1341: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:00.5434755Z _ext_post_unflatten_transform(subtensor.view(shape), param_extension) 2023-01-11T22:51:00.5435797Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py:1341: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:00.5436002Z _ext_post_unflatten_transform(subtensor.view(shape), param_extension) 2023-01-11T22:51:00.5436242Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 0 2023-01-11T22:51:00.5436476Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 1 2023-01-11T22:51:00.5436866Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:51:00.5437254Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:51:00.5437990Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5438236Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 0 2023-01-11T22:51:00.5438472Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 1 2023-01-11T22:51:00.5438844Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:51:00.5439233Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:51:00.5439472Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 0 2023-01-11T22:51:00.5439708Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 1 2023-01-11T22:51:00.5440101Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:51:00.5440488Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:51:00.5440726Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 0 2023-01-11T22:51:00.5440959Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 1 2023-01-11T22:51:00.5441350Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:51:00.5441717Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:51:00.5441831Z dist init r=1, world=2 2023-01-11T22:51:00.5441996Z dist init r=0, world=2 2023-01-11T22:51:00.5442098Z ok (29.349s) 2023-01-11T22:51:00.5442460Z test_mixture_of_experts_with_delay_before_free_offload_true_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 93319 2023-01-11T22:51:00.5442678Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 93320 2023-01-11T22:51:00.5443051Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.5443227Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.5443589Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.5443778Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.5444147Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.5444323Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.5444744Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.5444938Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.5445181Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.5445425Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.5445822Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.5446191Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.5446421Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.5446650Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.5447665Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.5447777Z warnings.warn( 2023-01-11T22:51:00.5448018Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:51:00.5449015Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.5449127Z warnings.warn( 2023-01-11T22:51:00.5449364Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:51:00.5449756Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:51:00.5450495Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5450888Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:51:00.5451054Z File "", line 1, in 2023-01-11T22:51:00.5451269Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5451413Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5451616Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5451765Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5451980Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5452086Z self.run() 2023-01-11T22:51:00.5452287Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5452416Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5452762Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5452896Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5453262Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5453387Z getattr(self, test_name)() 2023-01-11T22:51:00.5453789Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5453892Z fn() 2023-01-11T22:51:00.5454261Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5454366Z test(self, **param_kwargs) 2023-01-11T22:51:00.5454720Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5454845Z return func(*args, **kwargs) 2023-01-11T22:51:00.5455124Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5455243Z self.run_subtests( 2023-01-11T22:51:00.5455596Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5455760Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5456122Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5456257Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5457043Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5457175Z output = model(*input) 2023-01-11T22:51:00.5457512Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5457650Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5458022Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5458199Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5458567Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5458671Z _lazy_init(state, module) 2023-01-11T22:51:00.5459021Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5459191Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5459586Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5459726Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5460064Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5460189Z return func(*args, **kwargs) 2023-01-11T22:51:00.5460653Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5460738Z p_assert( 2023-01-11T22:51:00.5461076Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5461203Z traceback.print_stack() 2023-01-11T22:51:00.5461948Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5462080Z File "", line 1, in 2023-01-11T22:51:00.5462287Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5462429Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5462631Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5462784Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5462976Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5463146Z self.run() 2023-01-11T22:51:00.5463358Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5463505Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5463846Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5463981Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5464341Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5464446Z getattr(self, test_name)() 2023-01-11T22:51:00.5464800Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5464903Z fn() 2023-01-11T22:51:00.5465269Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5465392Z test(self, **param_kwargs) 2023-01-11T22:51:00.5465751Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5465877Z return func(*args, **kwargs) 2023-01-11T22:51:00.5466152Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5466248Z self.run_subtests( 2023-01-11T22:51:00.5466599Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5466761Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5467122Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5467276Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5467655Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5467775Z output = model(*input) 2023-01-11T22:51:00.5468096Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5468215Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5468589Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5468760Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5469127Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5469248Z _lazy_init(state, module) 2023-01-11T22:51:00.5469598Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5469818Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5470224Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5470368Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5470687Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5470813Z return func(*args, **kwargs) 2023-01-11T22:51:00.5471193Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5471296Z p_assert( 2023-01-11T22:51:00.5471632Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5471758Z traceback.print_stack() 2023-01-11T22:51:00.5472005Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:51:00.5472227Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:51:00.5472668Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:51:00.5473415Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5473549Z File "", line 1, in 2023-01-11T22:51:00.5473757Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5473898Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5474101Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5474256Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5474469Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5474555Z self.run() 2023-01-11T22:51:00.5474756Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5474902Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5475242Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5475376Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5475736Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5475856Z getattr(self, test_name)() 2023-01-11T22:51:00.5476213Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5476296Z fn() 2023-01-11T22:51:00.5476660Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5476785Z test(self, **param_kwargs) 2023-01-11T22:51:00.5477139Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5477263Z return func(*args, **kwargs) 2023-01-11T22:51:00.5477539Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5477653Z self.run_subtests( 2023-01-11T22:51:00.5478004Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5478148Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5478509Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5478716Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5479098Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5479218Z output = model(*input) 2023-01-11T22:51:00.5479541Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5479679Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5480052Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5480206Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5480574Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5480693Z _lazy_init(state, module) 2023-01-11T22:51:00.5481046Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5481216Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5481672Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5481820Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5482160Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5482267Z return func(*args, **kwargs) 2023-01-11T22:51:00.5482643Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5482746Z p_assert( 2023-01-11T22:51:00.5483080Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5483208Z traceback.print_stack() 2023-01-11T22:51:00.5483609Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:51:00.5484350Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5484481Z File "", line 1, in 2023-01-11T22:51:00.5484687Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5484810Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5485013Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5485163Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5485374Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5485481Z self.run() 2023-01-11T22:51:00.5485679Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5485825Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5486167Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5486282Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5486642Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5486767Z getattr(self, test_name)() 2023-01-11T22:51:00.5487125Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5487221Z fn() 2023-01-11T22:51:00.5487582Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5487707Z test(self, **param_kwargs) 2023-01-11T22:51:00.5488103Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5488228Z return func(*args, **kwargs) 2023-01-11T22:51:00.5488508Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5488622Z self.run_subtests( 2023-01-11T22:51:00.5488971Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5489132Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5489496Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5489646Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5490018Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5490123Z output = model(*input) 2023-01-11T22:51:00.5490445Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5490624Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5491007Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5491182Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5491546Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5491667Z _lazy_init(state, module) 2023-01-11T22:51:00.5492018Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5492168Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5492565Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5492713Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5493107Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5493232Z return func(*args, **kwargs) 2023-01-11T22:51:00.5493608Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5493711Z p_assert( 2023-01-11T22:51:00.5494046Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5494154Z traceback.print_stack() 2023-01-11T22:51:00.5494397Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:51:00.5494639Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:51:00.5495040Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:51:00.5495783Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5495916Z File "", line 1, in 2023-01-11T22:51:00.5496123Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5496268Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5496468Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5496910Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5497135Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5497324Z self.run() 2023-01-11T22:51:00.5497528Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5497677Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5498028Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5498162Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5498502Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5498625Z getattr(self, test_name)() 2023-01-11T22:51:00.5498981Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5499080Z fn() 2023-01-11T22:51:00.5499443Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5499564Z test(self, **param_kwargs) 2023-01-11T22:51:00.5513630Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5513900Z return func(*args, **kwargs) 2023-01-11T22:51:00.5514210Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5514331Z self.run_subtests( 2023-01-11T22:51:00.5514711Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5514880Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5515260Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5515417Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5515783Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5515914Z output = model(*input) 2023-01-11T22:51:00.5516245Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5516391Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5516772Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5516951Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5517321Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5517449Z _lazy_init(state, module) 2023-01-11T22:51:00.5517787Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5518046Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5518520Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5518674Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5519020Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5519150Z return func(*args, **kwargs) 2023-01-11T22:51:00.5519532Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5519639Z p_assert( 2023-01-11T22:51:00.5519960Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5520092Z traceback.print_stack() 2023-01-11T22:51:00.5520494Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:51:00.5521247Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5521450Z File "", line 1, in 2023-01-11T22:51:00.5521668Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5521814Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5522020Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5522156Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5522434Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5522546Z self.run() 2023-01-11T22:51:00.5522753Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5522910Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5523312Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5523497Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5524009Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5524127Z getattr(self, test_name)() 2023-01-11T22:51:00.5524494Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5524595Z fn() 2023-01-11T22:51:00.5524959Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5525088Z test(self, **param_kwargs) 2023-01-11T22:51:00.5525443Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5525572Z return func(*args, **kwargs) 2023-01-11T22:51:00.5525856Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5525956Z self.run_subtests( 2023-01-11T22:51:00.5526315Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5526480Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5526847Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5527002Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5527378Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5527501Z output = model(*input) 2023-01-11T22:51:00.5527825Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5527951Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5528329Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5528509Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5528879Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5529003Z _lazy_init(state, module) 2023-01-11T22:51:00.5529362Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5529533Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5529931Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5530059Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5530456Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5530586Z return func(*args, **kwargs) 2023-01-11T22:51:00.5530971Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5531077Z p_assert( 2023-01-11T22:51:00.5531415Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5531546Z traceback.print_stack() 2023-01-11T22:51:00.5531795Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2023-01-11T22:51:00.5532022Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2023-01-11T22:51:00.5532422Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:51:00.5533208Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5533358Z File "", line 1, in 2023-01-11T22:51:00.5533577Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5533728Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5533935Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5534089Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5534305Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5534394Z self.run() 2023-01-11T22:51:00.5534601Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5534751Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5535103Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5535243Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5535612Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5535741Z getattr(self, test_name)() 2023-01-11T22:51:00.5536102Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5536188Z fn() 2023-01-11T22:51:00.5536978Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5537129Z test(self, **param_kwargs) 2023-01-11T22:51:00.5537499Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5537630Z return func(*args, **kwargs) 2023-01-11T22:51:00.5537916Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5538034Z self.run_subtests( 2023-01-11T22:51:00.5538391Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5538540Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5538904Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5539061Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5539439Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5539562Z output = model(*input) 2023-01-11T22:51:00.5539893Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5540130Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5540513Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5540677Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5541048Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5541172Z _lazy_init(state, module) 2023-01-11T22:51:00.5541530Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5541701Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5542100Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5542248Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5542591Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5542702Z return func(*args, **kwargs) 2023-01-11T22:51:00.5543159Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5543277Z p_assert( 2023-01-11T22:51:00.5543783Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5544042Z traceback.print_stack() 2023-01-11T22:51:00.5550865Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:51:00.5551665Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5551800Z File "", line 1, in 2023-01-11T22:51:00.5552007Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5552145Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5552340Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5552472Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5552679Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5552775Z self.run() 2023-01-11T22:51:00.5552967Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5553104Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5553440Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5553564Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5553921Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5554026Z getattr(self, test_name)() 2023-01-11T22:51:00.5554375Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5554464Z fn() 2023-01-11T22:51:00.5554820Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5554934Z test(self, **param_kwargs) 2023-01-11T22:51:00.5555285Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5555404Z return func(*args, **kwargs) 2023-01-11T22:51:00.5555672Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5555770Z self.run_subtests( 2023-01-11T22:51:00.5556209Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5556361Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5556717Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5556860Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5557225Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5557335Z output = model(*input) 2023-01-11T22:51:00.5557654Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5557773Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5558139Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5558306Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5558725Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5558844Z _lazy_init(state, module) 2023-01-11T22:51:00.5559189Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5559347Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5559731Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5559857Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5581025Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5581146Z return func(*args, **kwargs) 2023-01-11T22:51:00.5581514Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5581616Z p_assert( 2023-01-11T22:51:00.5581948Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5582066Z traceback.print_stack() 2023-01-11T22:51:00.5582300Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2023-01-11T22:51:00.5582525Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2023-01-11T22:51:00.5582914Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:51:00.5583650Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5583776Z File "", line 1, in 2023-01-11T22:51:00.5583980Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5584114Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5584305Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5584446Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5584648Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5584735Z self.run() 2023-01-11T22:51:00.5584925Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5585061Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5585393Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5585517Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5585995Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5586110Z getattr(self, test_name)() 2023-01-11T22:51:00.5586462Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5586562Z fn() 2023-01-11T22:51:00.5586914Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5587027Z test(self, **param_kwargs) 2023-01-11T22:51:00.5587371Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5587486Z return func(*args, **kwargs) 2023-01-11T22:51:00.5587752Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5587860Z self.run_subtests( 2023-01-11T22:51:00.5588199Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5588418Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5588784Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5588926Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5589292Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5589404Z output = model(*input) 2023-01-11T22:51:00.5589719Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5589847Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5590213Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5590374Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5590733Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5590846Z _lazy_init(state, module) 2023-01-11T22:51:00.5591189Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5591347Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5591787Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5591922Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5592250Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5592358Z return func(*args, **kwargs) 2023-01-11T22:51:00.5592725Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5592822Z p_assert( 2023-01-11T22:51:00.5593148Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5593266Z traceback.print_stack() 2023-01-11T22:51:00.5593654Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:51:00.5594389Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5594511Z File "", line 1, in 2023-01-11T22:51:00.5594713Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5594899Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5595094Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5595239Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5595441Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5595537Z self.run() 2023-01-11T22:51:00.5595729Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5595865Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5596189Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5596315Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5596666Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5596781Z getattr(self, test_name)() 2023-01-11T22:51:00.5597132Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5597221Z fn() 2023-01-11T22:51:00.5597622Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5597740Z test(self, **param_kwargs) 2023-01-11T22:51:00.5598079Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5598194Z return func(*args, **kwargs) 2023-01-11T22:51:00.5598460Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5598566Z self.run_subtests( 2023-01-11T22:51:00.5598906Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5599058Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5599415Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5599557Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5599915Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5600026Z output = model(*input) 2023-01-11T22:51:00.5600348Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5600487Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5600859Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5601015Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5601380Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5601502Z _lazy_init(state, module) 2023-01-11T22:51:00.5601852Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5602022Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5602417Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5602559Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5602897Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5603004Z return func(*args, **kwargs) 2023-01-11T22:51:00.5603377Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5603479Z p_assert( 2023-01-11T22:51:00.5603813Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5603996Z traceback.print_stack() 2023-01-11T22:51:00.5604243Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2023-01-11T22:51:00.5604485Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2023-01-11T22:51:00.5604884Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:51:00.5605629Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5605744Z File "", line 1, in 2023-01-11T22:51:00.5605955Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5606101Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5606303Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5606499Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5606722Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5606828Z self.run() 2023-01-11T22:51:00.5607030Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5607159Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5607501Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5607633Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5607992Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5608114Z getattr(self, test_name)() 2023-01-11T22:51:00.5608478Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5608576Z fn() 2023-01-11T22:51:00.5608923Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5609047Z test(self, **param_kwargs) 2023-01-11T22:51:00.5609401Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5609527Z return func(*args, **kwargs) 2023-01-11T22:51:00.5609807Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5609922Z self.run_subtests( 2023-01-11T22:51:00.5610271Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5610431Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5610780Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5610935Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5611308Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5611428Z output = model(*input) 2023-01-11T22:51:00.5611753Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5611889Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5612260Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5612433Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5612798Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5612961Z _lazy_init(state, module) 2023-01-11T22:51:00.5613319Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5613487Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5613882Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5614025Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5614362Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5614487Z return func(*args, **kwargs) 2023-01-11T22:51:00.5614862Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5614947Z p_assert( 2023-01-11T22:51:00.5615285Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5615413Z traceback.print_stack() 2023-01-11T22:51:00.5615854Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:51:00.5616989Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5617136Z File "", line 1, in 2023-01-11T22:51:00.5617346Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5617489Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5617673Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5617829Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5618038Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5618141Z self.run() 2023-01-11T22:51:00.5618347Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5618492Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5618843Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5618977Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5619319Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5619442Z getattr(self, test_name)() 2023-01-11T22:51:00.5619796Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5619896Z fn() 2023-01-11T22:51:00.5620257Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5620384Z test(self, **param_kwargs) 2023-01-11T22:51:00.5620740Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5620867Z return func(*args, **kwargs) 2023-01-11T22:51:00.5621124Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5621238Z self.run_subtests( 2023-01-11T22:51:00.5621589Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5621752Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5622113Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5622263Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5622731Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5622852Z output = model(*input) 2023-01-11T22:51:00.5623164Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5623302Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5623676Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5623848Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5624215Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5624336Z _lazy_init(state, module) 2023-01-11T22:51:00.5624683Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5624852Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5625287Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5625441Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5625783Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5625909Z return func(*args, **kwargs) 2023-01-11T22:51:00.5626284Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5626387Z p_assert( 2023-01-11T22:51:00.5626724Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5626852Z traceback.print_stack() 2023-01-11T22:51:00.5627077Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2023-01-11T22:51:00.5627324Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2023-01-11T22:51:00.5627724Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:51:00.5628470Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5628600Z File "", line 1, in 2023-01-11T22:51:00.5628812Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5628952Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5629153Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5629307Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5629500Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5629603Z self.run() 2023-01-11T22:51:00.5629806Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5629951Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5630293Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5630424Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5630784Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5630906Z getattr(self, test_name)() 2023-01-11T22:51:00.5631243Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5631342Z fn() 2023-01-11T22:51:00.5631766Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5631891Z test(self, **param_kwargs) 2023-01-11T22:51:00.5632251Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5632375Z return func(*args, **kwargs) 2023-01-11T22:51:00.5632652Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5632767Z self.run_subtests( 2023-01-11T22:51:00.5633103Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5633263Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5633624Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5633774Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5634154Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5634319Z output = model(*input) 2023-01-11T22:51:00.5634657Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5634799Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5635154Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5635328Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5635694Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5635813Z _lazy_init(state, module) 2023-01-11T22:51:00.5636161Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5636333Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5636732Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5636874Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5637192Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5637316Z return func(*args, **kwargs) 2023-01-11T22:51:00.5637693Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5637794Z p_assert( 2023-01-11T22:51:00.5638127Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5638252Z traceback.print_stack() 2023-01-11T22:51:00.5638649Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:51:00.5639399Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5639531Z File "", line 1, in 2023-01-11T22:51:00.5639723Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5639865Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5640065Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5640214Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5640426Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5640529Z self.run() 2023-01-11T22:51:00.5640791Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5640918Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5641266Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5641398Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5641757Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5641880Z getattr(self, test_name)() 2023-01-11T22:51:00.5642236Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5642334Z fn() 2023-01-11T22:51:00.5642697Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5642802Z test(self, **param_kwargs) 2023-01-11T22:51:00.5643158Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5643286Z return func(*args, **kwargs) 2023-01-11T22:51:00.5643619Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5643740Z self.run_subtests( 2023-01-11T22:51:00.5644095Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5644257Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5644619Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5644752Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5645125Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5645245Z output = model(*input) 2023-01-11T22:51:00.5645572Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5645710Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5646087Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5646261Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5646624Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5646726Z _lazy_init(state, module) 2023-01-11T22:51:00.5647076Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5647242Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5647634Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5647779Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5648117Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5648242Z return func(*args, **kwargs) 2023-01-11T22:51:00.5648618Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5648720Z p_assert( 2023-01-11T22:51:00.5649037Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5649162Z traceback.print_stack() 2023-01-11T22:51:00.5649405Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2023-01-11T22:51:00.5649651Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2023-01-11T22:51:00.5650049Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:51:00.5650855Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5650988Z File "", line 1, in 2023-01-11T22:51:00.5651199Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5651341Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5651524Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5651672Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5651883Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5651986Z self.run() 2023-01-11T22:51:00.5652191Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5652335Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5652719Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5652839Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5653203Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5653327Z getattr(self, test_name)() 2023-01-11T22:51:00.5653684Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5653787Z fn() 2023-01-11T22:51:00.5654146Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5654268Z test(self, **param_kwargs) 2023-01-11T22:51:00.5654621Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5654734Z return func(*args, **kwargs) 2023-01-11T22:51:00.5655012Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5655126Z self.run_subtests( 2023-01-11T22:51:00.5655474Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5655635Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5655996Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5656149Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5656522Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5656971Z output = model(*input) 2023-01-11T22:51:00.5657308Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5657446Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5657822Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5657997Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5658365Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5658486Z _lazy_init(state, module) 2023-01-11T22:51:00.5658840Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5658990Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5659388Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5659619Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5659965Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5660090Z return func(*args, **kwargs) 2023-01-11T22:51:00.5660468Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5660572Z p_assert( 2023-01-11T22:51:00.5660905Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5661013Z traceback.print_stack() 2023-01-11T22:51:00.5661411Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:51:00.5662154Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5662345Z File "", line 1, in 2023-01-11T22:51:00.5662567Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5662712Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5662913Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5663063Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5663273Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5663359Z self.run() 2023-01-11T22:51:00.5663560Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5663707Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5664050Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5664188Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5664554Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5664676Z getattr(self, test_name)() 2023-01-11T22:51:00.5665030Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5665111Z fn() 2023-01-11T22:51:00.5665472Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5665595Z test(self, **param_kwargs) 2023-01-11T22:51:00.5665950Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5666075Z return func(*args, **kwargs) 2023-01-11T22:51:00.5666349Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5666466Z self.run_subtests( 2023-01-11T22:51:00.5666822Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5666966Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5667329Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5667481Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5667853Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5667974Z output = model(*input) 2023-01-11T22:51:00.5668298Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5668434Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5668874Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5669030Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5669401Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5669523Z _lazy_init(state, module) 2023-01-11T22:51:00.5669876Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5670043Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5670438Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5670579Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5670915Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5671026Z return func(*args, **kwargs) 2023-01-11T22:51:00.5671447Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5671557Z p_assert( 2023-01-11T22:51:00.5671896Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5672022Z traceback.print_stack() 2023-01-11T22:51:00.5672267Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2023-01-11T22:51:00.5672506Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2023-01-11T22:51:00.5672901Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:51:00.5673275Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:51:00.5674025Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5674763Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5675493Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5676235Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5676964Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5677096Z File "", line 1, in 2023-01-11T22:51:00.5677307Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5677448Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5677651Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5677871Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5678084Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5678174Z self.run() 2023-01-11T22:51:00.5678378Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5678523Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5678865Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5678998Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5679358Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5679481Z getattr(self, test_name)() 2023-01-11T22:51:00.5679837Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5679921Z fn() 2023-01-11T22:51:00.5680284Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5680406Z test(self, **param_kwargs) 2023-01-11T22:51:00.5680804Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5680936Z return func(*args, **kwargs) 2023-01-11T22:51:00.5681213Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5681328Z self.run_subtests( 2023-01-11T22:51:00.5681682Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5681826Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5682187Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5682342Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5682718Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5682838Z output = model(*input) 2023-01-11T22:51:00.5683163Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5683300Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5683673Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5683830Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5684197Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5684317Z _lazy_init(state, module) 2023-01-11T22:51:00.5684668Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5684838Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5685234Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5685376Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5685714Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5685839Z return func(*args, **kwargs) 2023-01-11T22:51:00.5686196Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5686300Z p_assert( 2023-01-11T22:51:00.5686634Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5686759Z traceback.print_stack() 2023-01-11T22:51:00.5687559Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5688297Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5689031Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5689801Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5690537Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5690669Z File "", line 1, in 2023-01-11T22:51:00.5690881Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5691024Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5691210Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5691363Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5691627Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5691735Z self.run() 2023-01-11T22:51:00.5691941Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5692089Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5692434Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5692549Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5692909Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5693033Z getattr(self, test_name)() 2023-01-11T22:51:00.5693389Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5693488Z fn() 2023-01-11T22:51:00.5693851Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5693972Z test(self, **param_kwargs) 2023-01-11T22:51:00.5694330Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5694438Z return func(*args, **kwargs) 2023-01-11T22:51:00.5694713Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5694828Z self.run_subtests( 2023-01-11T22:51:00.5695180Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5695341Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5695703Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5695854Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5696290Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5696396Z output = model(*input) 2023-01-11T22:51:00.5696913Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5697055Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5697437Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5697614Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5697982Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5698104Z _lazy_init(state, module) 2023-01-11T22:51:00.5698452Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5698624Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5699075Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5699229Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5699570Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5699697Z return func(*args, **kwargs) 2023-01-11T22:51:00.5700074Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5700177Z p_assert( 2023-01-11T22:51:00.5700509Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5700635Z traceback.print_stack() 2023-01-11T22:51:00.5700863Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2023-01-11T22:51:00.5701105Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2023-01-11T22:51:00.5701506Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:51:00.5702244Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5702374Z File "", line 1, in 2023-01-11T22:51:00.5702583Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5702725Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5702929Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5703065Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5703276Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5703380Z self.run() 2023-01-11T22:51:00.5703587Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5703733Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5704078Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5704210Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5704568Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5704673Z getattr(self, test_name)() 2023-01-11T22:51:00.5705029Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5705126Z fn() 2023-01-11T22:51:00.5705566Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5705690Z test(self, **param_kwargs) 2023-01-11T22:51:00.5706047Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5706173Z return func(*args, **kwargs) 2023-01-11T22:51:00.5706449Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5706545Z self.run_subtests( 2023-01-11T22:51:00.5706894Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5707055Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5707414Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5707571Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5707943Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5708121Z output = model(*input) 2023-01-11T22:51:00.5708456Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5708578Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5708952Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5709125Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5709493Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5709612Z _lazy_init(state, module) 2023-01-11T22:51:00.5709963Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5710134Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5710532Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5710657Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5710992Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5711117Z return func(*args, **kwargs) 2023-01-11T22:51:00.5711494Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5711596Z p_assert( 2023-01-11T22:51:00.5711929Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5712056Z traceback.print_stack() 2023-01-11T22:51:00.5712451Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:51:00.5713202Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5713316Z File "", line 1, in 2023-01-11T22:51:00.5713525Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5713665Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5713871Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5714021Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5714233Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5714337Z self.run() 2023-01-11T22:51:00.5714592Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5714719Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5715061Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5715195Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5715557Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5715682Z getattr(self, test_name)() 2023-01-11T22:51:00.5716042Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5716139Z fn() 2023-01-11T22:51:00.5716502Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5716608Z test(self, **param_kwargs) 2023-01-11T22:51:00.5716963Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5717092Z return func(*args, **kwargs) 2023-01-11T22:51:00.5717409Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5717530Z self.run_subtests( 2023-01-11T22:51:00.5717882Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5718041Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5718404Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5718538Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5718910Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5719033Z output = model(*input) 2023-01-11T22:51:00.5719356Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5719494Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5719872Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5720045Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5720407Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5720510Z _lazy_init(state, module) 2023-01-11T22:51:00.5720862Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5721029Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5721424Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5721569Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5721908Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5722032Z return func(*args, **kwargs) 2023-01-11T22:51:00.5722406Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5722491Z p_assert( 2023-01-11T22:51:00.5722825Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5722949Z traceback.print_stack() 2023-01-11T22:51:00.5723194Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2023-01-11T22:51:00.5723433Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2023-01-11T22:51:00.5723832Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:51:00.5724638Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5724772Z File "", line 1, in 2023-01-11T22:51:00.5724981Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5725105Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5725307Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5725458Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5725672Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5725781Z self.run() 2023-01-11T22:51:00.5725982Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5726126Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5726491Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5726633Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5726995Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5727119Z getattr(self, test_name)() 2023-01-11T22:51:00.5727476Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5727574Z fn() 2023-01-11T22:51:00.5727933Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5728055Z test(self, **param_kwargs) 2023-01-11T22:51:00.5728396Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5728521Z return func(*args, **kwargs) 2023-01-11T22:51:00.5728799Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5728911Z self.run_subtests( 2023-01-11T22:51:00.5729262Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5729425Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5729785Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5729935Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5730292Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5730416Z output = model(*input) 2023-01-11T22:51:00.5730738Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5730880Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5731254Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5731429Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5731796Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5731917Z _lazy_init(state, module) 2023-01-11T22:51:00.5732268Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5732416Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5732810Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5733006Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5733351Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5733477Z return func(*args, **kwargs) 2023-01-11T22:51:00.5733851Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5733955Z p_assert( 2023-01-11T22:51:00.5734287Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5734396Z traceback.print_stack() 2023-01-11T22:51:00.5734790Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:51:00.5735530Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5735711Z File "", line 1, in 2023-01-11T22:51:00.5735928Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5736070Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5736276Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5736427Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5736810Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5736923Z self.run() 2023-01-11T22:51:00.5737126Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5737274Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5737629Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5737762Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5738126Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5738249Z getattr(self, test_name)() 2023-01-11T22:51:00.5738586Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5738682Z fn() 2023-01-11T22:51:00.5739044Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5739167Z test(self, **param_kwargs) 2023-01-11T22:51:00.5739522Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5739647Z return func(*args, **kwargs) 2023-01-11T22:51:00.5739921Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5740042Z self.run_subtests( 2023-01-11T22:51:00.5740376Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5740538Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5740900Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5741050Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5741424Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5741543Z output = model(*input) 2023-01-11T22:51:00.5741866Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5742003Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5742451Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5742629Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5742997Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5743121Z _lazy_init(state, module) 2023-01-11T22:51:00.5743473Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5743639Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5744033Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5744174Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5744492Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5744622Z return func(*args, **kwargs) 2023-01-11T22:51:00.5745053Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5745165Z p_assert( 2023-01-11T22:51:00.5745501Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5745628Z traceback.print_stack() 2023-01-11T22:51:00.5745876Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2023-01-11T22:51:00.5746114Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2023-01-11T22:51:00.5746496Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:51:00.5747246Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5747386Z File "", line 1, in 2023-01-11T22:51:00.5747594Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5747735Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5747937Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5748086Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5748295Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5748399Z self.run() 2023-01-11T22:51:00.5748581Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5748726Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5749071Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5749203Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5749565Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5749687Z getattr(self, test_name)() 2023-01-11T22:51:00.5750045Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5750142Z fn() 2023-01-11T22:51:00.5750485Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5750608Z test(self, **param_kwargs) 2023-01-11T22:51:00.5750961Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5751086Z return func(*args, **kwargs) 2023-01-11T22:51:00.5751430Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5751546Z self.run_subtests( 2023-01-11T22:51:00.5751907Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5752069Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5752416Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5752568Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5752944Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5753063Z output = model(*input) 2023-01-11T22:51:00.5753389Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5753527Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5753903Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5754126Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5754481Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5754604Z _lazy_init(state, module) 2023-01-11T22:51:00.5754957Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5755124Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5755517Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5755659Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5755995Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5756124Z return func(*args, **kwargs) 2023-01-11T22:51:00.5756484Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5756581Z p_assert( 2023-01-11T22:51:00.5756912Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5757036Z traceback.print_stack() 2023-01-11T22:51:00.5757433Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:51:00.5758173Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5758306Z File "", line 1, in 2023-01-11T22:51:00.5758515Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5758659Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5758844Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5758994Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5759204Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5759307Z self.run() 2023-01-11T22:51:00.5759507Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5759651Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5759990Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5760105Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5760463Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5760645Z getattr(self, test_name)() 2023-01-11T22:51:00.5761014Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5761116Z fn() 2023-01-11T22:51:00.5761479Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5761602Z test(self, **param_kwargs) 2023-01-11T22:51:00.5761956Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5762063Z return func(*args, **kwargs) 2023-01-11T22:51:00.5762342Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5762455Z self.run_subtests( 2023-01-11T22:51:00.5762806Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5762970Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5763376Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5763535Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5763911Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5764012Z output = model(*input) 2023-01-11T22:51:00.5764335Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5764474Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5764845Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5765018Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5765387Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5765511Z _lazy_init(state, module) 2023-01-11T22:51:00.5765861Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5766027Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5766403Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5766544Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5766881Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5767006Z return func(*args, **kwargs) 2023-01-11T22:51:00.5767382Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5767488Z p_assert( 2023-01-11T22:51:00.5767823Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5767953Z traceback.print_stack() 2023-01-11T22:51:00.5768182Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2023-01-11T22:51:00.5768419Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2023-01-11T22:51:00.5768818Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:51:00.5769563Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5770005Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:51:00.5770736Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5770978Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2023-01-11T22:51:00.5771212Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2023-01-11T22:51:00.5771601Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:51:00.5771992Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:51:00.5772219Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2023-01-11T22:51:00.5772510Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2023-01-11T22:51:00.5772915Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:51:00.5773308Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:51:00.5773546Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2023-01-11T22:51:00.5773778Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2023-01-11T22:51:00.5774167Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:51:00.5774557Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:51:00.5774798Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2023-01-11T22:51:00.5775032Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2023-01-11T22:51:00.5775405Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:51:00.5775790Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:51:00.5776025Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2023-01-11T22:51:00.5776256Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2023-01-11T22:51:00.5776844Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:51:00.5777250Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:51:00.5777491Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 1 2023-01-11T22:51:00.5777721Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 0 2023-01-11T22:51:00.5778110Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:51:00.5778477Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:51:00.5779222Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5779552Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 1 2023-01-11T22:51:00.5779792Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 0 2023-01-11T22:51:00.5780180Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:51:00.5780566Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:51:00.5781301Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5782090Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5782841Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5783582Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5784314Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5785038Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5785766Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5786498Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5787219Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5787945Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5788731Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5789457Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5790180Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5790944Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5791731Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5792458Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5793187Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5793911Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5794627Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5795353Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5796070Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5796791Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5797574Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5798295Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5799015Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5799779Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5800512Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5801232Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5801964Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5802682Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5803402Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5804128Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5804848Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5805564Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5806344Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5807065Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5807785Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5808544Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5808796Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 0 2023-01-11T22:51:00.5809017Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 1 2023-01-11T22:51:00.5809418Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:51:00.5809811Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:51:00.5810535Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5811266Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5811506Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 1 2023-01-11T22:51:00.5811739Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 0 2023-01-11T22:51:00.5812127Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:51:00.5812849Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5813236Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:51:00.5813476Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 1 2023-01-11T22:51:00.5813866Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:51:00.5814088Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 0 2023-01-11T22:51:00.5814481Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:51:00.5815276Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5815515Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 1 2023-01-11T22:51:00.5815748Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 0 2023-01-11T22:51:00.5816135Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:51:00.5817038Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5817444Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:51:00.5817774Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:26 to store for rank: 1 2023-01-11T22:51:00.5818185Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:26 with 2 nodes. 2023-01-11T22:51:00.5818426Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:26 to store for rank: 0 2023-01-11T22:51:00.5818795Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:26 with 2 nodes. 2023-01-11T22:51:00.5819525Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5819768Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:27 to store for rank: 1 2023-01-11T22:51:00.5820002Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:27 to store for rank: 0 2023-01-11T22:51:00.5820390Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:27 with 2 nodes. 2023-01-11T22:51:00.5821115Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5821498Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:27 with 2 nodes. 2023-01-11T22:51:00.5821740Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:28 to store for rank: 1 2023-01-11T22:51:00.5821975Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:28 to store for rank: 0 2023-01-11T22:51:00.5822368Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:28 with 2 nodes. 2023-01-11T22:51:00.5823103Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5823493Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:28 with 2 nodes. 2023-01-11T22:51:00.5823716Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:29 to store for rank: 1 2023-01-11T22:51:00.5824017Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:29 to store for rank: 0 2023-01-11T22:51:00.5824416Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:29 with 2 nodes. 2023-01-11T22:51:00.5825151Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5825536Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:29 with 2 nodes. 2023-01-11T22:51:00.5825776Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:30 to store for rank: 1 2023-01-11T22:51:00.5826012Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:30 to store for rank: 0 2023-01-11T22:51:00.5826408Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:30 with 2 nodes. 2023-01-11T22:51:00.5827194Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5827594Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:30 with 2 nodes. 2023-01-11T22:51:00.5827833Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:31 to store for rank: 1 2023-01-11T22:51:00.5828220Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:31 with 2 nodes. 2023-01-11T22:51:00.5828440Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:31 to store for rank: 0 2023-01-11T22:51:00.5828833Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:31 with 2 nodes. 2023-01-11T22:51:00.5829568Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5829807Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:32 to store for rank: 1 2023-01-11T22:51:00.5830037Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:32 to store for rank: 0 2023-01-11T22:51:00.5830426Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:32 with 2 nodes. 2023-01-11T22:51:00.5831150Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5831537Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:32 with 2 nodes. 2023-01-11T22:51:00.5832254Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5832487Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:33 to store for rank: 1 2023-01-11T22:51:00.5832871Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:33 with 2 nodes. 2023-01-11T22:51:00.5833162Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:33 to store for rank: 0 2023-01-11T22:51:00.5833541Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:33 with 2 nodes. 2023-01-11T22:51:00.5834273Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5834996Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5835789Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5836535Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5837253Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5837981Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5838704Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5839428Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5840153Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5840874Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5841594Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5842318Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5843094Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5843816Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5844578Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5845313Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5846034Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5846758Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5847478Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5848197Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5848916Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5849638Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5850355Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5851077Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5851844Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5852564Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5853320Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5854052Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5854771Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5855492Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5856209Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5857104Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5857840Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5858563Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5859283Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5859997Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5860801Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5861521Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5862291Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5863022Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5863742Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5864469Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5865188Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5865905Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5866625Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5867346Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5867591Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:34 to store for rank: 1 2023-01-11T22:51:00.5867826Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:34 to store for rank: 0 2023-01-11T22:51:00.5868226Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:34 with 2 nodes. 2023-01-11T22:51:00.5869003Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5869397Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:34 with 2 nodes. 2023-01-11T22:51:00.5870096Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5870336Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:35 to store for rank: 1 2023-01-11T22:51:00.5870569Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:35 to store for rank: 0 2023-01-11T22:51:00.5870962Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:35 with 2 nodes. 2023-01-11T22:51:00.5871719Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5872113Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:35 with 2 nodes. 2023-01-11T22:51:00.5872828Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5873070Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:36 to store for rank: 1 2023-01-11T22:51:00.5873308Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:36 to store for rank: 0 2023-01-11T22:51:00.5873693Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:36 with 2 nodes. 2023-01-11T22:51:00.5874403Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5874785Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:36 with 2 nodes. 2023-01-11T22:51:00.5875496Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5875737Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:37 to store for rank: 1 2023-01-11T22:51:00.5875951Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:37 to store for rank: 0 2023-01-11T22:51:00.5876338Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:37 with 2 nodes. 2023-01-11T22:51:00.5877047Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5877486Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:37 with 2 nodes. 2023-01-11T22:51:00.5878202Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5878316Z dist init r=1, world=2 2023-01-11T22:51:00.5878643Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.5878959Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.5879264Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.5879609Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.5879919Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.5880221Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.5880555Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.5880859Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.5881163Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.5881465Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.5881762Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.5882061Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.5882173Z dist init r=0, world=2 2023-01-11T22:51:00.5882496Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.5882808Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.5883116Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.5883418Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.5883701Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.5884000Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.5884297Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.5884656Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.5884955Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.5885255Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.5885552Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.5885851Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.5885957Z ok (30.957s) 2023-01-11T22:51:00.5886347Z test_mixture_of_experts_with_delay_before_free_offload_true_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 93666 2023-01-11T22:51:00.5886576Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 93667 2023-01-11T22:51:00.5886936Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.5887114Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.5887494Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.5887684Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.5888050Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.5888229Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.5888606Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.5888795Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.5889020Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.5889264Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.5889658Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.5890049Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.5890275Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.5890507Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.5891520Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.5891686Z warnings.warn( 2023-01-11T22:51:00.5892695Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.5892868Z warnings.warn( 2023-01-11T22:51:00.5893115Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:51:00.5893357Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:51:00.5893734Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:51:00.5894256Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T22:51:00.5894366Z warnings.warn( 2023-01-11T22:51:00.5895108Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5895301Z File "", line 1, in 2023-01-11T22:51:00.5895521Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5895664Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5895868Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5896019Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5896214Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5896319Z self.run() 2023-01-11T22:51:00.5896522Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5896842Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5897194Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5897334Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5897700Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5897824Z getattr(self, test_name)() 2023-01-11T22:51:00.5898166Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5898263Z fn() 2023-01-11T22:51:00.5898628Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5898750Z test(self, **param_kwargs) 2023-01-11T22:51:00.5899104Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5899227Z return func(*args, **kwargs) 2023-01-11T22:51:00.5899503Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5899621Z self.run_subtests( 2023-01-11T22:51:00.5899960Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5900122Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5900482Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5900635Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5901008Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5901127Z output = model(*input) 2023-01-11T22:51:00.5901451Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5901590Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5902042Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5902220Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5902588Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5902709Z _lazy_init(state, module) 2023-01-11T22:51:00.5903059Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5903227Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5903621Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5903764Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5904083Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5904212Z return func(*args, **kwargs) 2023-01-11T22:51:00.5904646Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5904760Z p_assert( 2023-01-11T22:51:00.5905099Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5905226Z traceback.print_stack() 2023-01-11T22:51:00.5905622Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:51:00.5906149Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T22:51:00.5906243Z warnings.warn( 2023-01-11T22:51:00.5906991Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5907129Z File "", line 1, in 2023-01-11T22:51:00.5907338Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5907480Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5907683Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5907832Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5908045Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5908149Z self.run() 2023-01-11T22:51:00.5908334Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5908481Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5908823Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5908955Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5909318Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5909441Z getattr(self, test_name)() 2023-01-11T22:51:00.5909797Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5909877Z fn() 2023-01-11T22:51:00.5910241Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5910364Z test(self, **param_kwargs) 2023-01-11T22:51:00.5910717Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5910841Z return func(*args, **kwargs) 2023-01-11T22:51:00.5911177Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5911292Z self.run_subtests( 2023-01-11T22:51:00.5911650Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5911793Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5912156Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5912308Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5912680Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5912799Z output = model(*input) 2023-01-11T22:51:00.5913123Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5913264Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5913637Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5913854Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5914212Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5914335Z _lazy_init(state, module) 2023-01-11T22:51:00.5914684Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5914851Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5915244Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5915390Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5915727Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5915857Z return func(*args, **kwargs) 2023-01-11T22:51:00.5916217Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5916321Z p_assert( 2023-01-11T22:51:00.5916655Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5916780Z traceback.print_stack() 2023-01-11T22:51:00.5917024Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:51:00.5917267Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:51:00.5917661Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:51:00.5918405Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5918538Z File "", line 1, in 2023-01-11T22:51:00.5918730Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5918871Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5919072Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5919220Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5919431Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5919535Z self.run() 2023-01-11T22:51:00.5919735Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5919864Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5920261Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5920394Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5920759Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5920882Z getattr(self, test_name)() 2023-01-11T22:51:00.5921238Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5921336Z fn() 2023-01-11T22:51:00.5921697Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5921802Z test(self, **param_kwargs) 2023-01-11T22:51:00.5922157Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5922281Z return func(*args, **kwargs) 2023-01-11T22:51:00.5922561Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5922673Z self.run_subtests( 2023-01-11T22:51:00.5923083Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5923253Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5923621Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5923756Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5924131Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5924251Z output = model(*input) 2023-01-11T22:51:00.5924576Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5924715Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5925089Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5925265Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5925631Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5925734Z _lazy_init(state, module) 2023-01-11T22:51:00.5926084Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5926250Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5926647Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5926787Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5927124Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5927251Z return func(*args, **kwargs) 2023-01-11T22:51:00.5927633Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5927719Z p_assert( 2023-01-11T22:51:00.5928054Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5928179Z traceback.print_stack() 2023-01-11T22:51:00.5928579Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:51:00.5929321Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5929509Z File "", line 1, in 2023-01-11T22:51:00.5929721Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5929867Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5930068Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5930201Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5930413Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5930517Z self.run() 2023-01-11T22:51:00.5930718Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5930864Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5931204Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5931335Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5931695Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5931802Z getattr(self, test_name)() 2023-01-11T22:51:00.5932202Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5932307Z fn() 2023-01-11T22:51:00.5932673Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5932797Z test(self, **param_kwargs) 2023-01-11T22:51:00.5933152Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5933275Z return func(*args, **kwargs) 2023-01-11T22:51:00.5933548Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5933645Z self.run_subtests( 2023-01-11T22:51:00.5933997Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5934158Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5934521Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5934673Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5935047Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5935165Z output = model(*input) 2023-01-11T22:51:00.5935489Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5935610Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5935983Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5936159Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5936526Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5936902Z _lazy_init(state, module) 2023-01-11T22:51:00.5937266Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5937433Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5937829Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5937956Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5938293Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5938418Z return func(*args, **kwargs) 2023-01-11T22:51:00.5938794Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5938984Z p_assert( 2023-01-11T22:51:00.5939329Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5939456Z traceback.print_stack() 2023-01-11T22:51:00.5939702Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:51:00.5939926Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:51:00.5940323Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:51:00.5941064Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5941197Z File "", line 1, in 2023-01-11T22:51:00.5941405Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5941607Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5941827Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5941978Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5942190Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5942276Z self.run() 2023-01-11T22:51:00.5942478Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5942625Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5942967Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5943099Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5943465Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5943588Z getattr(self, test_name)() 2023-01-11T22:51:00.5943949Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5944029Z fn() 2023-01-11T22:51:00.5944392Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5944516Z test(self, **param_kwargs) 2023-01-11T22:51:00.5944872Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5944995Z return func(*args, **kwargs) 2023-01-11T22:51:00.5945270Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5945383Z self.run_subtests( 2023-01-11T22:51:00.5945724Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5945887Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5946249Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5946403Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5946775Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5946894Z output = model(*input) 2023-01-11T22:51:00.5947217Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5947353Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5947726Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5947940Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5948308Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5948432Z _lazy_init(state, module) 2023-01-11T22:51:00.5948784Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5948952Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5949348Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5949490Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5949827Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5949935Z return func(*args, **kwargs) 2023-01-11T22:51:00.5950309Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5950414Z p_assert( 2023-01-11T22:51:00.5950797Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5950931Z traceback.print_stack() 2023-01-11T22:51:00.5951331Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:51:00.5952073Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5952205Z File "", line 1, in 2023-01-11T22:51:00.5952412Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5952542Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5952743Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5952893Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5953106Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5953211Z self.run() 2023-01-11T22:51:00.5953411Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5953555Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5953877Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5954010Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5954368Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5954491Z getattr(self, test_name)() 2023-01-11T22:51:00.5954845Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5954947Z fn() 2023-01-11T22:51:00.5955310Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5955434Z test(self, **param_kwargs) 2023-01-11T22:51:00.5955769Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5955893Z return func(*args, **kwargs) 2023-01-11T22:51:00.5956168Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5956283Z self.run_subtests( 2023-01-11T22:51:00.5956633Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5956793Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5957254Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5957409Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5957765Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5957886Z output = model(*input) 2023-01-11T22:51:00.5958211Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5958350Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5958723Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5958895Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5959258Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5959385Z _lazy_init(state, module) 2023-01-11T22:51:00.5959720Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5959945Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5960350Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5960492Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5960830Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5960956Z return func(*args, **kwargs) 2023-01-11T22:51:00.5961331Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5961434Z p_assert( 2023-01-11T22:51:00.5961750Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5961881Z traceback.print_stack() 2023-01-11T22:51:00.5962125Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2023-01-11T22:51:00.5962371Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2023-01-11T22:51:00.5962768Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:51:00.5963508Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5963640Z File "", line 1, in 2023-01-11T22:51:00.5963848Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5963993Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5964180Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5964334Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5964544Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5964648Z self.run() 2023-01-11T22:51:00.5964848Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5964992Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5965331Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5965465Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5965804Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5965928Z getattr(self, test_name)() 2023-01-11T22:51:00.5966344Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5966444Z fn() 2023-01-11T22:51:00.5966807Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5966929Z test(self, **param_kwargs) 2023-01-11T22:51:00.5967283Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5967407Z return func(*args, **kwargs) 2023-01-11T22:51:00.5967665Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5967779Z self.run_subtests( 2023-01-11T22:51:00.5968128Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5968290Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5968654Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5968805Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5969221Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5969350Z output = model(*input) 2023-01-11T22:51:00.5969660Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5969798Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5970167Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5970341Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5970705Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5970829Z _lazy_init(state, module) 2023-01-11T22:51:00.5971178Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5971347Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5971725Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5971869Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5972205Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5972330Z return func(*args, **kwargs) 2023-01-11T22:51:00.5972703Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5972805Z p_assert( 2023-01-11T22:51:00.5973140Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5973269Z traceback.print_stack() 2023-01-11T22:51:00.5973654Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:51:00.5974397Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5974528Z File "", line 1, in 2023-01-11T22:51:00.5974737Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5974878Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5975080Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5975229Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5975497Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5975602Z self.run() 2023-01-11T22:51:00.5975789Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5975934Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5976273Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5976405Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5976938Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5977068Z getattr(self, test_name)() 2023-01-11T22:51:00.5977435Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5977519Z fn() 2023-01-11T22:51:00.5977880Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5978006Z test(self, **param_kwargs) 2023-01-11T22:51:00.5978429Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5978565Z return func(*args, **kwargs) 2023-01-11T22:51:00.5978844Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5978959Z self.run_subtests( 2023-01-11T22:51:00.5979317Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5979460Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5979826Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5979977Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5980358Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5980477Z output = model(*input) 2023-01-11T22:51:00.5980802Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5980941Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5981313Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5981469Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5981835Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5981954Z _lazy_init(state, module) 2023-01-11T22:51:00.5982303Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5982474Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5982870Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5983015Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5983351Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5983477Z return func(*args, **kwargs) 2023-01-11T22:51:00.5983833Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5983935Z p_assert( 2023-01-11T22:51:00.5984267Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5984395Z traceback.print_stack() 2023-01-11T22:51:00.5984642Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2023-01-11T22:51:00.5984955Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2023-01-11T22:51:00.5985358Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:51:00.5986105Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5986237Z File "", line 1, in 2023-01-11T22:51:00.5986429Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5986572Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5986774Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5986926Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5987146Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5987249Z self.run() 2023-01-11T22:51:00.5987496Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5987632Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5987976Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5988108Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5988468Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5988592Z getattr(self, test_name)() 2023-01-11T22:51:00.5988951Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.5989050Z fn() 2023-01-11T22:51:00.5989411Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.5989524Z test(self, **param_kwargs) 2023-01-11T22:51:00.5989881Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.5990006Z return func(*args, **kwargs) 2023-01-11T22:51:00.5990282Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.5990396Z self.run_subtests( 2023-01-11T22:51:00.5990748Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.5990909Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.5991271Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.5991406Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.5991833Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.5991955Z output = model(*input) 2023-01-11T22:51:00.5992286Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.5992424Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.5992802Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.5992975Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.5993343Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.5993448Z _lazy_init(state, module) 2023-01-11T22:51:00.5993799Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.5994026Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.5994430Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.5994572Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.5994908Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.5995031Z return func(*args, **kwargs) 2023-01-11T22:51:00.5995407Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.5995491Z p_assert( 2023-01-11T22:51:00.5995824Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.5995951Z traceback.print_stack() 2023-01-11T22:51:00.5996350Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:51:00.5997136Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.5997274Z File "", line 1, in 2023-01-11T22:51:00.5997485Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.5997627Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.5997830Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.5997963Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.5998176Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.5998278Z self.run() 2023-01-11T22:51:00.5998480Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.5998632Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.5998976Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.5999109Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.5999472Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.5999578Z getattr(self, test_name)() 2023-01-11T22:51:00.5999934Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6000034Z fn() 2023-01-11T22:51:00.6000395Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6000518Z test(self, **param_kwargs) 2023-01-11T22:51:00.6000872Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6000998Z return func(*args, **kwargs) 2023-01-11T22:51:00.6001276Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6001372Z self.run_subtests( 2023-01-11T22:51:00.6001722Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6001883Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6002245Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6002399Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6002771Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6002892Z output = model(*input) 2023-01-11T22:51:00.6003274Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6003394Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6003771Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6003944Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6004309Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6004431Z _lazy_init(state, module) 2023-01-11T22:51:00.6004784Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6004951Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6005345Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6005472Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6005807Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6005976Z return func(*args, **kwargs) 2023-01-11T22:51:00.6006359Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6006463Z p_assert( 2023-01-11T22:51:00.6006797Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6006922Z traceback.print_stack() 2023-01-11T22:51:00.6007164Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2023-01-11T22:51:00.6007387Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2023-01-11T22:51:00.6007784Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:51:00.6008536Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6008666Z File "", line 1, in 2023-01-11T22:51:00.6008876Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6009018Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6009221Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6009372Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6009585Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6009672Z self.run() 2023-01-11T22:51:00.6009879Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6010025Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6010367Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6010501Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6010862Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6010985Z getattr(self, test_name)() 2023-01-11T22:51:00.6011340Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6011421Z fn() 2023-01-11T22:51:00.6011784Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6011906Z test(self, **param_kwargs) 2023-01-11T22:51:00.6012260Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6012439Z return func(*args, **kwargs) 2023-01-11T22:51:00.6012718Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6012833Z self.run_subtests( 2023-01-11T22:51:00.6013168Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6013329Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6013689Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6013842Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6014215Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6014333Z output = model(*input) 2023-01-11T22:51:00.6014660Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6014797Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6015216Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6015379Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6015747Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6015866Z _lazy_init(state, module) 2023-01-11T22:51:00.6016216Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6016383Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6016952Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6017103Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6017448Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6017560Z return func(*args, **kwargs) 2023-01-11T22:51:00.6017936Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6018037Z p_assert( 2023-01-11T22:51:00.6018373Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6018499Z traceback.print_stack() 2023-01-11T22:51:00.6018895Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:51:00.6019636Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6019770Z File "", line 1, in 2023-01-11T22:51:00.6019981Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6020105Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6020305Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6020454Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6020664Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6020767Z self.run() 2023-01-11T22:51:00.6020968Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6021112Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6021432Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6021646Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6022016Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6022141Z getattr(self, test_name)() 2023-01-11T22:51:00.6022497Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6022596Z fn() 2023-01-11T22:51:00.6022957Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6023080Z test(self, **param_kwargs) 2023-01-11T22:51:00.6023413Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6023539Z return func(*args, **kwargs) 2023-01-11T22:51:00.6023813Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6023931Z self.run_subtests( 2023-01-11T22:51:00.6024353Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6024524Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6024889Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6025041Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6025398Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6025518Z output = model(*input) 2023-01-11T22:51:00.6025841Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6025979Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6026352Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6026533Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6026899Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6027020Z _lazy_init(state, module) 2023-01-11T22:51:00.6027353Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6027521Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6027915Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6028059Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6028397Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6028524Z return func(*args, **kwargs) 2023-01-11T22:51:00.6028898Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6029003Z p_assert( 2023-01-11T22:51:00.6029322Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6029450Z traceback.print_stack() 2023-01-11T22:51:00.6029692Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2023-01-11T22:51:00.6029936Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2023-01-11T22:51:00.6030332Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:51:00.6031074Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6031264Z File "", line 1, in 2023-01-11T22:51:00.6031473Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6031616Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6031800Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6031951Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6032162Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6032265Z self.run() 2023-01-11T22:51:00.6032467Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6032611Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6032954Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6033093Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6033476Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6033607Z getattr(self, test_name)() 2023-01-11T22:51:00.6033963Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6034061Z fn() 2023-01-11T22:51:00.6034422Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6034543Z test(self, **param_kwargs) 2023-01-11T22:51:00.6034896Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6035022Z return func(*args, **kwargs) 2023-01-11T22:51:00.6035280Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6035398Z self.run_subtests( 2023-01-11T22:51:00.6035749Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6035910Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6036275Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6036425Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6036797Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6036916Z output = model(*input) 2023-01-11T22:51:00.6037220Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6037362Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6037739Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6037912Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6038279Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6038399Z _lazy_init(state, module) 2023-01-11T22:51:00.6038747Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6038913Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6039288Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6039433Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6039766Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6039945Z return func(*args, **kwargs) 2023-01-11T22:51:00.6040323Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6040430Z p_assert( 2023-01-11T22:51:00.6040763Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6040888Z traceback.print_stack() 2023-01-11T22:51:00.6041267Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:51:00.6042011Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6042145Z File "", line 1, in 2023-01-11T22:51:00.6042358Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6042498Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6042744Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6042902Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6043114Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6043219Z self.run() 2023-01-11T22:51:00.6043402Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6043548Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6043891Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6044024Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6044386Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6044514Z getattr(self, test_name)() 2023-01-11T22:51:00.6044874Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6044956Z fn() 2023-01-11T22:51:00.6045318Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6045442Z test(self, **param_kwargs) 2023-01-11T22:51:00.6045798Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6045922Z return func(*args, **kwargs) 2023-01-11T22:51:00.6046196Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6046309Z self.run_subtests( 2023-01-11T22:51:00.6046658Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6046805Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6047172Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6047322Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6047697Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6047816Z output = model(*input) 2023-01-11T22:51:00.6048139Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6048277Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6048649Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6048805Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6049236Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6049358Z _lazy_init(state, module) 2023-01-11T22:51:00.6049711Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6049880Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6050276Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6050418Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6050756Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6050881Z return func(*args, **kwargs) 2023-01-11T22:51:00.6051239Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6051346Z p_assert( 2023-01-11T22:51:00.6051678Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6051804Z traceback.print_stack() 2023-01-11T22:51:00.6052091Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2023-01-11T22:51:00.6052342Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2023-01-11T22:51:00.6052738Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:51:00.6053478Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6053608Z File "", line 1, in 2023-01-11T22:51:00.6053804Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6053946Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6054152Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6054302Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6054511Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6054615Z self.run() 2023-01-11T22:51:00.6054814Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6054944Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6055285Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6055416Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6055774Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6055900Z getattr(self, test_name)() 2023-01-11T22:51:00.6056260Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6056358Z fn() 2023-01-11T22:51:00.6056893Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6057007Z test(self, **param_kwargs) 2023-01-11T22:51:00.6057373Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6057498Z return func(*args, **kwargs) 2023-01-11T22:51:00.6057775Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6057890Z self.run_subtests( 2023-01-11T22:51:00.6058240Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6058487Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6058857Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6058990Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6059365Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6059486Z output = model(*input) 2023-01-11T22:51:00.6059809Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6059948Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6060322Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6060494Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6060865Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6060968Z _lazy_init(state, module) 2023-01-11T22:51:00.6061377Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6061555Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6061956Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6062098Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6062436Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6062564Z return func(*args, **kwargs) 2023-01-11T22:51:00.6062939Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6063028Z p_assert( 2023-01-11T22:51:00.6063364Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6063493Z traceback.print_stack() 2023-01-11T22:51:00.6063890Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:51:00.6064634Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6064768Z File "", line 1, in 2023-01-11T22:51:00.6064977Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6065117Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6065322Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6065456Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6065669Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6065772Z self.run() 2023-01-11T22:51:00.6065973Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6066118Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6066462Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6066594Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6066951Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6067057Z getattr(self, test_name)() 2023-01-11T22:51:00.6067412Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6067568Z fn() 2023-01-11T22:51:00.6067935Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6068063Z test(self, **param_kwargs) 2023-01-11T22:51:00.6068418Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6068543Z return func(*args, **kwargs) 2023-01-11T22:51:00.6068819Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6068917Z self.run_subtests( 2023-01-11T22:51:00.6069268Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6069428Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6069790Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6069946Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6070363Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6070489Z output = model(*input) 2023-01-11T22:51:00.6070818Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6070939Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6071311Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6071485Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6071851Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6071973Z _lazy_init(state, module) 2023-01-11T22:51:00.6072329Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6072493Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6072892Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6073018Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6073355Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6073479Z return func(*args, **kwargs) 2023-01-11T22:51:00.6073854Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6073956Z p_assert( 2023-01-11T22:51:00.6074288Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6074414Z traceback.print_stack() 2023-01-11T22:51:00.6074660Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2023-01-11T22:51:00.6074883Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2023-01-11T22:51:00.6075283Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:51:00.6076027Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6076765Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6077567Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6078302Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6079026Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6079162Z File "", line 1, in 2023-01-11T22:51:00.6079417Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6079568Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6079773Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6079922Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6080118Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6080224Z self.run() 2023-01-11T22:51:00.6080426Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6080573Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6080916Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6081048Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6081414Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6081537Z getattr(self, test_name)() 2023-01-11T22:51:00.6081881Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6081980Z fn() 2023-01-11T22:51:00.6082340Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6082462Z test(self, **param_kwargs) 2023-01-11T22:51:00.6082817Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6082940Z return func(*args, **kwargs) 2023-01-11T22:51:00.6083215Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6083331Z self.run_subtests( 2023-01-11T22:51:00.6083663Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6083831Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6084192Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6084345Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6084718Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6084837Z output = model(*input) 2023-01-11T22:51:00.6085161Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6085298Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6085655Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6085885Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6086257Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6086376Z _lazy_init(state, module) 2023-01-11T22:51:00.6086729Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6086897Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6087292Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6087433Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6087754Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6087880Z return func(*args, **kwargs) 2023-01-11T22:51:00.6088257Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6088359Z p_assert( 2023-01-11T22:51:00.6088752Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6088886Z traceback.print_stack() 2023-01-11T22:51:00.6089285Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:51:00.6090024Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6090758Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6091503Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6092296Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6093026Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6093163Z File "", line 1, in 2023-01-11T22:51:00.6093359Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6093501Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6093704Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6093853Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6094064Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6094172Z self.run() 2023-01-11T22:51:00.6094372Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6094500Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6094840Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6095032Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6095399Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6095521Z getattr(self, test_name)() 2023-01-11T22:51:00.6095881Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6095980Z fn() 2023-01-11T22:51:00.6096340Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6096446Z test(self, **param_kwargs) 2023-01-11T22:51:00.6096987Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6097116Z return func(*args, **kwargs) 2023-01-11T22:51:00.6097392Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6097510Z self.run_subtests( 2023-01-11T22:51:00.6097967Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6098139Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6098504Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6098638Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6099014Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6099134Z output = model(*input) 2023-01-11T22:51:00.6099459Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6099595Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6099973Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6100146Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6100512Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6100632Z _lazy_init(state, module) 2023-01-11T22:51:00.6100968Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6101135Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6101530Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6101672Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6102009Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6102137Z return func(*args, **kwargs) 2023-01-11T22:51:00.6102510Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6102614Z p_assert( 2023-01-11T22:51:00.6102931Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6103057Z traceback.print_stack() 2023-01-11T22:51:00.6103301Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2023-01-11T22:51:00.6103540Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2023-01-11T22:51:00.6103940Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:51:00.6104679Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6104898Z File "", line 1, in 2023-01-11T22:51:00.6105109Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6105251Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6105434Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6105584Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6105795Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6105898Z self.run() 2023-01-11T22:51:00.6106101Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6106246Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6106590Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6106708Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6107116Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6107248Z getattr(self, test_name)() 2023-01-11T22:51:00.6107610Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6107709Z fn() 2023-01-11T22:51:00.6108073Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6108196Z test(self, **param_kwargs) 2023-01-11T22:51:00.6108550Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6108658Z return func(*args, **kwargs) 2023-01-11T22:51:00.6108934Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6109054Z self.run_subtests( 2023-01-11T22:51:00.6109408Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6109570Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6109934Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6110084Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6110455Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6110557Z output = model(*input) 2023-01-11T22:51:00.6110881Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6111017Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6111393Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6111570Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6111937Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6112056Z _lazy_init(state, module) 2023-01-11T22:51:00.6112406Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6112557Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6112951Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6113092Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6113430Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6113609Z return func(*args, **kwargs) 2023-01-11T22:51:00.6113994Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6114097Z p_assert( 2023-01-11T22:51:00.6114434Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6114543Z traceback.print_stack() 2023-01-11T22:51:00.6114939Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:51:00.6115678Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6115807Z File "", line 1, in 2023-01-11T22:51:00.6116019Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6116160Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6116406Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6116568Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6116780Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6116867Z self.run() 2023-01-11T22:51:00.6117067Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6117213Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6117553Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6117686Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6118047Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6118176Z getattr(self, test_name)() 2023-01-11T22:51:00.6118535Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6118616Z fn() 2023-01-11T22:51:00.6118975Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6119096Z test(self, **param_kwargs) 2023-01-11T22:51:00.6119449Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6119573Z return func(*args, **kwargs) 2023-01-11T22:51:00.6119848Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6119965Z self.run_subtests( 2023-01-11T22:51:00.6120315Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6120462Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6120824Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6120977Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6121350Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6121469Z output = model(*input) 2023-01-11T22:51:00.6121795Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6121932Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6122305Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6122459Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6122880Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6123002Z _lazy_init(state, module) 2023-01-11T22:51:00.6123358Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6123527Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6123922Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6124064Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6124398Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6124505Z return func(*args, **kwargs) 2023-01-11T22:51:00.6124878Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6124984Z p_assert( 2023-01-11T22:51:00.6125317Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6125486Z traceback.print_stack() 2023-01-11T22:51:00.6125738Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2023-01-11T22:51:00.6125975Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2023-01-11T22:51:00.6126372Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:51:00.6127111Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6127228Z File "", line 1, in 2023-01-11T22:51:00.6127435Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6127578Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6127781Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6127931Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6128143Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6128246Z self.run() 2023-01-11T22:51:00.6128449Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6128578Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6128916Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6129047Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6129409Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6129537Z getattr(self, test_name)() 2023-01-11T22:51:00.6129899Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6129996Z fn() 2023-01-11T22:51:00.6130340Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6130464Z test(self, **param_kwargs) 2023-01-11T22:51:00.6130816Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6130940Z return func(*args, **kwargs) 2023-01-11T22:51:00.6131216Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6131330Z self.run_subtests( 2023-01-11T22:51:00.6131679Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6131893Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6132244Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6132397Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6132769Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6132886Z output = model(*input) 2023-01-11T22:51:00.6133211Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6133349Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6133723Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6133896Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6134263Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6134366Z _lazy_init(state, module) 2023-01-11T22:51:00.6134760Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6134934Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6135331Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6135474Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6135812Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6135935Z return func(*args, **kwargs) 2023-01-11T22:51:00.6136307Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6136397Z p_assert( 2023-01-11T22:51:00.6136907Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6137044Z traceback.print_stack() 2023-01-11T22:51:00.6137448Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:51:00.6138191Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6138324Z File "", line 1, in 2023-01-11T22:51:00.6138534Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6138676Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6138862Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6139015Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6139229Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6139332Z self.run() 2023-01-11T22:51:00.6139533Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6139678Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6140014Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6140147Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6140486Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6140610Z getattr(self, test_name)() 2023-01-11T22:51:00.6140966Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6141146Z fn() 2023-01-11T22:51:00.6141517Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6141646Z test(self, **param_kwargs) 2023-01-11T22:51:00.6142000Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6142125Z return func(*args, **kwargs) 2023-01-11T22:51:00.6142383Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6142498Z self.run_subtests( 2023-01-11T22:51:00.6142845Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6143009Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6143367Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6143524Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6143951Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6144082Z output = model(*input) 2023-01-11T22:51:00.6144392Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6144536Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6144910Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6145084Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6145446Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6145568Z _lazy_init(state, module) 2023-01-11T22:51:00.6145923Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6146088Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6146466Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6146608Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6146944Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6147068Z return func(*args, **kwargs) 2023-01-11T22:51:00.6147443Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6147546Z p_assert( 2023-01-11T22:51:00.6147879Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6148008Z traceback.print_stack() 2023-01-11T22:51:00.6148235Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2023-01-11T22:51:00.6148474Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2023-01-11T22:51:00.6148872Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:51:00.6149617Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6149748Z File "", line 1, in 2023-01-11T22:51:00.6149960Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6150100Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6150359Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6150509Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6150707Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6150813Z self.run() 2023-01-11T22:51:00.6151012Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6151159Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6151500Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6151633Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6151989Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6152113Z getattr(self, test_name)() 2023-01-11T22:51:00.6152452Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6152554Z fn() 2023-01-11T22:51:00.6152975Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6153107Z test(self, **param_kwargs) 2023-01-11T22:51:00.6153464Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6153591Z return func(*args, **kwargs) 2023-01-11T22:51:00.6153864Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6153980Z self.run_subtests( 2023-01-11T22:51:00.6154313Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6154474Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6154835Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6154990Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6155367Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6155486Z output = model(*input) 2023-01-11T22:51:00.6155809Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6155945Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6156300Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6156474Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6156838Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6156958Z _lazy_init(state, module) 2023-01-11T22:51:00.6157314Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6157482Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6157876Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6158018Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6158338Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6158464Z return func(*args, **kwargs) 2023-01-11T22:51:00.6158838Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6158940Z p_assert( 2023-01-11T22:51:00.6159275Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6159457Z traceback.print_stack() 2023-01-11T22:51:00.6159862Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:51:00.6160607Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6160741Z File "", line 1, in 2023-01-11T22:51:00.6160929Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6161071Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6161273Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6161422Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6161632Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6161740Z self.run() 2023-01-11T22:51:00.6161939Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6162115Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6162468Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6162602Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6162961Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6163084Z getattr(self, test_name)() 2023-01-11T22:51:00.6163440Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6163538Z fn() 2023-01-11T22:51:00.6163899Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6164008Z test(self, **param_kwargs) 2023-01-11T22:51:00.6164363Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6164492Z return func(*args, **kwargs) 2023-01-11T22:51:00.6164767Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6164880Z self.run_subtests( 2023-01-11T22:51:00.6165231Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6165392Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6165755Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6165889Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6166265Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6166388Z output = model(*input) 2023-01-11T22:51:00.6166716Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6166853Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6167228Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6167401Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6167768Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6167888Z _lazy_init(state, module) 2023-01-11T22:51:00.6168220Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6168386Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6168837Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6168982Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6169320Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6169444Z return func(*args, **kwargs) 2023-01-11T22:51:00.6169817Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6169920Z p_assert( 2023-01-11T22:51:00.6170235Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6170360Z traceback.print_stack() 2023-01-11T22:51:00.6170604Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2023-01-11T22:51:00.6170849Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2023-01-11T22:51:00.6171248Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:51:00.6172044Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6172444Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:51:00.6173171Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6173419Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2023-01-11T22:51:00.6173652Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2023-01-11T22:51:00.6174023Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:51:00.6174415Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:51:00.6174653Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2023-01-11T22:51:00.6174885Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2023-01-11T22:51:00.6175274Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:51:00.6175669Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:51:00.6175907Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2023-01-11T22:51:00.6176138Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2023-01-11T22:51:00.6176524Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:51:00.6177192Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:51:00.6177413Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2023-01-11T22:51:00.6177645Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2023-01-11T22:51:00.6178037Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:51:00.6178525Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:51:00.6178763Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2023-01-11T22:51:00.6178994Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2023-01-11T22:51:00.6179381Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:51:00.6179768Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:51:00.6180003Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 0 2023-01-11T22:51:00.6180215Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 1 2023-01-11T22:51:00.6180652Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:51:00.6181110Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:51:00.6181867Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6182112Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 0 2023-01-11T22:51:00.6182343Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 1 2023-01-11T22:51:00.6182733Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:51:00.6183127Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:51:00.6183862Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6184589Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6185325Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6186061Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6186786Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6187510Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6188295Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6189022Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6189784Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6190517Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6191239Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6192012Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6192744Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6193467Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6194186Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6194908Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6195632Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6196354Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6197143Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6197862Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6198585Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6199351Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6200078Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6200803Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6201037Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 0 2023-01-11T22:51:00.6201276Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 1 2023-01-11T22:51:00.6201675Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:51:00.6202065Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:51:00.6202785Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6203805Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py:451: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:00.6203942Z shapes.append(param.shape) 2023-01-11T22:51:00.6204183Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 0 2023-01-11T22:51:00.6204416Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 1 2023-01-11T22:51:00.6204808Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:51:00.6205254Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:51:00.6205985Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6206212Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 0 2023-01-11T22:51:00.6206446Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 1 2023-01-11T22:51:00.6206836Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:51:00.6207226Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:51:00.6208014Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6208264Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 0 2023-01-11T22:51:00.6208497Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 1 2023-01-11T22:51:00.6208888Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:51:00.6209278Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:51:00.6210015Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6210263Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:26 to store for rank: 0 2023-01-11T22:51:00.6210480Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:26 to store for rank: 1 2023-01-11T22:51:00.6210871Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:26 with 2 nodes. 2023-01-11T22:51:00.6211259Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:26 with 2 nodes. 2023-01-11T22:51:00.6211990Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6212237Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:27 to store for rank: 0 2023-01-11T22:51:00.6212470Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:27 to store for rank: 1 2023-01-11T22:51:00.6212858Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:27 with 2 nodes. 2023-01-11T22:51:00.6213247Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:27 with 2 nodes. 2023-01-11T22:51:00.6213977Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6214269Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:28 to store for rank: 0 2023-01-11T22:51:00.6214509Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:28 to store for rank: 1 2023-01-11T22:51:00.6214901Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:28 with 2 nodes. 2023-01-11T22:51:00.6215274Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:28 with 2 nodes. 2023-01-11T22:51:00.6216006Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6216247Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:29 to store for rank: 0 2023-01-11T22:51:00.6216485Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:29 to store for rank: 1 2023-01-11T22:51:00.6217145Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:29 with 2 nodes. 2023-01-11T22:51:00.6217553Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:29 with 2 nodes. 2023-01-11T22:51:00.6218285Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6218527Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:30 to store for rank: 0 2023-01-11T22:51:00.6218765Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:30 to store for rank: 1 2023-01-11T22:51:00.6219155Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:30 with 2 nodes. 2023-01-11T22:51:00.6219541Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:30 with 2 nodes. 2023-01-11T22:51:00.6220256Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6220496Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:31 to store for rank: 0 2023-01-11T22:51:00.6220731Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:31 to store for rank: 1 2023-01-11T22:51:00.6221123Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:31 with 2 nodes. 2023-01-11T22:51:00.6221513Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:31 with 2 nodes. 2023-01-11T22:51:00.6222246Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6222485Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:32 to store for rank: 0 2023-01-11T22:51:00.6222720Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:32 to store for rank: 1 2023-01-11T22:51:00.6223107Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:32 with 2 nodes. 2023-01-11T22:51:00.6223571Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:32 with 2 nodes. 2023-01-11T22:51:00.6224314Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6224554Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:33 to store for rank: 0 2023-01-11T22:51:00.6224772Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:33 to store for rank: 1 2023-01-11T22:51:00.6225164Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:33 with 2 nodes. 2023-01-11T22:51:00.6225550Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:33 with 2 nodes. 2023-01-11T22:51:00.6226331Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6227071Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6227800Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6228533Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6229259Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6229982Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6230712Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6231436Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6232157Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6232938Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6233662Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6234382Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6235148Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6235878Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6236602Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6237326Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6238048Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6238766Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6239491Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6240213Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6240936Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6241755Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6242476Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6243189Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6243959Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6244689Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6245407Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6246128Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6246371Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:34 to store for rank: 0 2023-01-11T22:51:00.6246609Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:34 to store for rank: 1 2023-01-11T22:51:00.6247005Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:34 with 2 nodes. 2023-01-11T22:51:00.6247722Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6248116Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:34 with 2 nodes. 2023-01-11T22:51:00.6248811Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6249050Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:35 to store for rank: 0 2023-01-11T22:51:00.6249284Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:35 to store for rank: 1 2023-01-11T22:51:00.6249672Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:35 with 2 nodes. 2023-01-11T22:51:00.6250446Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6250831Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:35 with 2 nodes. 2023-01-11T22:51:00.6251546Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6251781Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:36 to store for rank: 0 2023-01-11T22:51:00.6252013Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:36 to store for rank: 1 2023-01-11T22:51:00.6252400Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:36 with 2 nodes. 2023-01-11T22:51:00.6253153Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6253545Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:36 with 2 nodes. 2023-01-11T22:51:00.6254258Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6254494Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:37 to store for rank: 0 2023-01-11T22:51:00.6254710Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:37 to store for rank: 1 2023-01-11T22:51:00.6255092Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:37 with 2 nodes. 2023-01-11T22:51:00.6255805Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6256188Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:37 with 2 nodes. 2023-01-11T22:51:00.6257074Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6257204Z dist init r=0, world=2 2023-01-11T22:51:00.6257534Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.6257849Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.6258156Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.6258457Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.6258841Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.6259145Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.6259425Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.6259722Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.6260018Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.6260316Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.6260667Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.6260973Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.6261087Z dist init r=1, world=2 2023-01-11T22:51:00.6261409Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.6261721Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.6262026Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.6262337Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.6262620Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.6262921Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.6263218Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.6263515Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.6263819Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.6264115Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.6264413Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.6264713Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.6264814Z ok (31.254s) 2023-01-11T22:51:00.6265176Z test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 94013 2023-01-11T22:51:00.6265448Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 94014 2023-01-11T22:51:00.6265818Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.6266001Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.6266381Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.6266573Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.6266943Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.6267116Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.6267489Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.6267680Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.6267967Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.6268197Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.6268596Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.6268992Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.6269219Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.6269447Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.6270460Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.6270576Z warnings.warn( 2023-01-11T22:51:00.6270818Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:51:00.6271811Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.6271923Z warnings.warn( 2023-01-11T22:51:00.6272157Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:51:00.6272534Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:51:00.6273068Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2023-01-11T22:51:00.6273178Z warnings.warn( 2023-01-11T22:51:00.6273918Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6274370Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:51:00.6274906Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2023-01-11T22:51:00.6275017Z warnings.warn( 2023-01-11T22:51:00.6275148Z File "", line 1, in 2023-01-11T22:51:00.6275357Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6275498Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6275683Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6275834Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6276047Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6276150Z self.run() 2023-01-11T22:51:00.6276355Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6276500Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6276909Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6277032Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6277399Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6277523Z getattr(self, test_name)() 2023-01-11T22:51:00.6277885Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6277983Z fn() 2023-01-11T22:51:00.6278347Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6278470Z test(self, **param_kwargs) 2023-01-11T22:51:00.6278827Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6278939Z return func(*args, **kwargs) 2023-01-11T22:51:00.6279218Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6279331Z self.run_subtests( 2023-01-11T22:51:00.6279686Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6279847Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6280210Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6280360Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6280734Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6280837Z output = model(*input) 2023-01-11T22:51:00.6281165Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6281303Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6281680Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6281855Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6282219Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6282338Z _lazy_init(state, module) 2023-01-11T22:51:00.6282690Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6282839Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6283236Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6283441Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6283783Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6283910Z return func(*args, **kwargs) 2023-01-11T22:51:00.6284286Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6284389Z p_assert( 2023-01-11T22:51:00.6284723Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6284831Z traceback.print_stack() 2023-01-11T22:51:00.6285575Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6285710Z File "", line 1, in 2023-01-11T22:51:00.6285919Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6286103Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6286312Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6286462Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6286674Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6286779Z self.run() 2023-01-11T22:51:00.6286963Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6287107Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6287448Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6287580Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6287946Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6288068Z getattr(self, test_name)() 2023-01-11T22:51:00.6288429Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6288510Z fn() 2023-01-11T22:51:00.6288872Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6288996Z test(self, **param_kwargs) 2023-01-11T22:51:00.6289350Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6289474Z return func(*args, **kwargs) 2023-01-11T22:51:00.6289747Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6289860Z self.run_subtests( 2023-01-11T22:51:00.6290211Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6290354Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6290717Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6290869Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6291241Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6291359Z output = model(*input) 2023-01-11T22:51:00.6291728Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6291870Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6292248Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6292483Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6292832Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6292958Z _lazy_init(state, module) 2023-01-11T22:51:00.6293312Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6293479Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6293876Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6294018Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6294353Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6294478Z return func(*args, **kwargs) 2023-01-11T22:51:00.6294833Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6294940Z p_assert( 2023-01-11T22:51:00.6295319Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6295452Z traceback.print_stack() 2023-01-11T22:51:00.6295697Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:51:00.6295939Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:51:00.6296340Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:51:00.6297262Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6297402Z File "", line 1, in 2023-01-11T22:51:00.6297594Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6297743Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6297946Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6298096Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6298309Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6298414Z self.run() 2023-01-11T22:51:00.6298611Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6298740Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6299085Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6299220Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6299584Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6299709Z getattr(self, test_name)() 2023-01-11T22:51:00.6300070Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6300167Z fn() 2023-01-11T22:51:00.6300528Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6300633Z test(self, **param_kwargs) 2023-01-11T22:51:00.6300988Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6301112Z return func(*args, **kwargs) 2023-01-11T22:51:00.6301386Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6301501Z self.run_subtests( 2023-01-11T22:51:00.6301942Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6302102Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6302467Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6302601Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6302979Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6303097Z output = model(*input) 2023-01-11T22:51:00.6303421Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6303557Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6303929Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6304104Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6304469Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6304632Z _lazy_init(state, module) 2023-01-11T22:51:00.6304997Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6305164Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6305559Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6305701Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6306036Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6306162Z return func(*args, **kwargs) 2023-01-11T22:51:00.6306538Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6306645Z p_assert( 2023-01-11T22:51:00.6306963Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6307090Z traceback.print_stack() 2023-01-11T22:51:00.6307488Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:51:00.6308230Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6308361Z File "", line 1, in 2023-01-11T22:51:00.6308568Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6308712Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6308913Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6309045Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6309259Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6309376Z self.run() 2023-01-11T22:51:00.6309560Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6309707Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6310045Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6310177Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6310539Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6310661Z getattr(self, test_name)() 2023-01-11T22:51:00.6311016Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6311154Z fn() 2023-01-11T22:51:00.6311523Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6311647Z test(self, **param_kwargs) 2023-01-11T22:51:00.6312000Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6312124Z return func(*args, **kwargs) 2023-01-11T22:51:00.6312399Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6312512Z self.run_subtests( 2023-01-11T22:51:00.6312865Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6313008Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6313371Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6313522Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6313938Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6314063Z output = model(*input) 2023-01-11T22:51:00.6314390Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6314528Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6314905Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6315061Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6315428Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6315553Z _lazy_init(state, module) 2023-01-11T22:51:00.6315904Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6316074Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6316469Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6316613Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6316948Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6317072Z return func(*args, **kwargs) 2023-01-11T22:51:00.6317429Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6317529Z p_assert( 2023-01-11T22:51:00.6317858Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6317988Z traceback.print_stack() 2023-01-11T22:51:00.6318231Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:51:00.6318478Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:51:00.6318871Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:51:00.6319610Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6319741Z File "", line 1, in 2023-01-11T22:51:00.6319933Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6320130Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6320331Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6320485Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6320696Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6320801Z self.run() 2023-01-11T22:51:00.6321002Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6321129Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6321469Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6321602Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6321963Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6322084Z getattr(self, test_name)() 2023-01-11T22:51:00.6322444Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6322541Z fn() 2023-01-11T22:51:00.6322948Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6323059Z test(self, **param_kwargs) 2023-01-11T22:51:00.6323415Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6323539Z return func(*args, **kwargs) 2023-01-11T22:51:00.6323812Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6323926Z self.run_subtests( 2023-01-11T22:51:00.6324275Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6324438Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6324805Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6324939Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6325317Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6325438Z output = model(*input) 2023-01-11T22:51:00.6325763Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6325900Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6326272Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6326447Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6326812Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6326918Z _lazy_init(state, module) 2023-01-11T22:51:00.6327271Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6327445Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6327844Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6327986Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6328322Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6328448Z return func(*args, **kwargs) 2023-01-11T22:51:00.6328823Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6328908Z p_assert( 2023-01-11T22:51:00.6329241Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6329422Z traceback.print_stack() 2023-01-11T22:51:00.6329828Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:51:00.6330573Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6330704Z File "", line 1, in 2023-01-11T22:51:00.6330913Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6331054Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6331256Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6331388Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6331605Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6331709Z self.run() 2023-01-11T22:51:00.6331954Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6332105Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6332443Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6332576Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6332935Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6333041Z getattr(self, test_name)() 2023-01-11T22:51:00.6333397Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6333495Z fn() 2023-01-11T22:51:00.6333859Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6333983Z test(self, **param_kwargs) 2023-01-11T22:51:00.6334342Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6334469Z return func(*args, **kwargs) 2023-01-11T22:51:00.6334745Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6334840Z self.run_subtests( 2023-01-11T22:51:00.6335187Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6335349Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6335712Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6335865Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6336246Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6336363Z output = model(*input) 2023-01-11T22:51:00.6336868Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6336997Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6337381Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6337558Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6337926Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6338046Z _lazy_init(state, module) 2023-01-11T22:51:00.6338397Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6338648Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6339048Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6339177Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6339516Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6339640Z return func(*args, **kwargs) 2023-01-11T22:51:00.6340015Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6340116Z p_assert( 2023-01-11T22:51:00.6340450Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6340575Z traceback.print_stack() 2023-01-11T22:51:00.6340820Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2023-01-11T22:51:00.6341050Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2023-01-11T22:51:00.6341518Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:51:00.6342278Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6342410Z File "", line 1, in 2023-01-11T22:51:00.6342616Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6342755Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6342955Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6343104Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6343322Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6343409Z self.run() 2023-01-11T22:51:00.6343610Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6343755Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6344094Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6344225Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6344583Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6344707Z getattr(self, test_name)() 2023-01-11T22:51:00.6345066Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6345147Z fn() 2023-01-11T22:51:00.6345510Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6345633Z test(self, **param_kwargs) 2023-01-11T22:51:00.6345989Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6346115Z return func(*args, **kwargs) 2023-01-11T22:51:00.6346390Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6346501Z self.run_subtests( 2023-01-11T22:51:00.6346835Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6346996Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6347355Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6347502Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6347931Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6348051Z output = model(*input) 2023-01-11T22:51:00.6348378Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6348514Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6348888Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6349045Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6349410Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6349530Z _lazy_init(state, module) 2023-01-11T22:51:00.6349880Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6350047Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6350490Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6350639Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6350980Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6351087Z return func(*args, **kwargs) 2023-01-11T22:51:00.6351460Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6351564Z p_assert( 2023-01-11T22:51:00.6351898Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6352023Z traceback.print_stack() 2023-01-11T22:51:00.6352420Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:51:00.6353172Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6353304Z File "", line 1, in 2023-01-11T22:51:00.6353513Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6353637Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6353839Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6353990Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6354201Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6354305Z self.run() 2023-01-11T22:51:00.6354507Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6354654Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6354979Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6355111Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6355468Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6355590Z getattr(self, test_name)() 2023-01-11T22:51:00.6355947Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6356046Z fn() 2023-01-11T22:51:00.6356403Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6356526Z test(self, **param_kwargs) 2023-01-11T22:51:00.6356862Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6357039Z return func(*args, **kwargs) 2023-01-11T22:51:00.6357318Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6357432Z self.run_subtests( 2023-01-11T22:51:00.6357785Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6357945Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6358306Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6358456Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6358808Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6358928Z output = model(*input) 2023-01-11T22:51:00.6359252Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6359390Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6359808Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6359989Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6360356Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6360476Z _lazy_init(state, module) 2023-01-11T22:51:00.6360807Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6360974Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6361371Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6361516Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6361853Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6361981Z return func(*args, **kwargs) 2023-01-11T22:51:00.6362354Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6362456Z p_assert( 2023-01-11T22:51:00.6362772Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6362898Z traceback.print_stack() 2023-01-11T22:51:00.6363144Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2023-01-11T22:51:00.6363385Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2023-01-11T22:51:00.6363779Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:51:00.6364170Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:51:00.6364918Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6365050Z File "", line 1, in 2023-01-11T22:51:00.6365258Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6365382Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6365583Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6365732Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6365998Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6366101Z self.run() 2023-01-11T22:51:00.6366301Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6366449Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6366790Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6366904Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6367262Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6367384Z getattr(self, test_name)() 2023-01-11T22:51:00.6367741Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6367840Z fn() 2023-01-11T22:51:00.6368199Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6368326Z test(self, **param_kwargs) 2023-01-11T22:51:00.6368676Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6368828Z return func(*args, **kwargs) 2023-01-11T22:51:00.6369111Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6369224Z self.run_subtests( 2023-01-11T22:51:00.6369576Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6369737Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6370099Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6370249Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6370618Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6370724Z output = model(*input) 2023-01-11T22:51:00.6371052Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6371187Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6371561Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6371734Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6372097Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6372218Z _lazy_init(state, module) 2023-01-11T22:51:00.6372568Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6372716Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6373114Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6373259Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6373597Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6373725Z return func(*args, **kwargs) 2023-01-11T22:51:00.6374098Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6374200Z p_assert( 2023-01-11T22:51:00.6374534Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6374642Z traceback.print_stack() 2023-01-11T22:51:00.6375382Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6375567Z File "", line 1, in 2023-01-11T22:51:00.6375782Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6375924Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6376124Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6376273Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6376485Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6376755Z self.run() 2023-01-11T22:51:00.6376953Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6377098Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6377446Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6377584Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6378042Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6378175Z getattr(self, test_name)() 2023-01-11T22:51:00.6378533Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6378613Z fn() 2023-01-11T22:51:00.6378973Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6379095Z test(self, **param_kwargs) 2023-01-11T22:51:00.6379450Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6379575Z return func(*args, **kwargs) 2023-01-11T22:51:00.6379850Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6379970Z self.run_subtests( 2023-01-11T22:51:00.6380321Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6380464Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6380823Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6380975Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6381347Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6381465Z output = model(*input) 2023-01-11T22:51:00.6381789Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6381926Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6382299Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6382453Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6382821Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6382941Z _lazy_init(state, module) 2023-01-11T22:51:00.6383293Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6383456Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6383848Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6383991Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6384329Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6384525Z return func(*args, **kwargs) 2023-01-11T22:51:00.6384884Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6384990Z p_assert( 2023-01-11T22:51:00.6385326Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6385451Z traceback.print_stack() 2023-01-11T22:51:00.6385692Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2023-01-11T22:51:00.6385935Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2023-01-11T22:51:00.6386331Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:51:00.6387067Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6387246Z File "", line 1, in 2023-01-11T22:51:00.6387442Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6387592Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6387794Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6387945Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6388154Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6388258Z self.run() 2023-01-11T22:51:00.6388455Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6388582Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6388921Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6389054Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6389413Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6389536Z getattr(self, test_name)() 2023-01-11T22:51:00.6389892Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6389994Z fn() 2023-01-11T22:51:00.6390353Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6390458Z test(self, **param_kwargs) 2023-01-11T22:51:00.6390808Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6390930Z return func(*args, **kwargs) 2023-01-11T22:51:00.6391204Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6391321Z self.run_subtests( 2023-01-11T22:51:00.6391675Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6391884Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6392254Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6392387Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6392759Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6392878Z output = model(*input) 2023-01-11T22:51:00.6393201Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6393338Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6393771Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6393945Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6394312Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6394415Z _lazy_init(state, module) 2023-01-11T22:51:00.6394764Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6394930Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6395324Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6395464Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6395802Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6395927Z return func(*args, **kwargs) 2023-01-11T22:51:00.6396302Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6396433Z p_assert( 2023-01-11T22:51:00.6396778Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6396904Z traceback.print_stack() 2023-01-11T22:51:00.6397305Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:51:00.6398045Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6398177Z File "", line 1, in 2023-01-11T22:51:00.6398393Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6398533Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6398731Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6398864Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6399076Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6399179Z self.run() 2023-01-11T22:51:00.6399378Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6399525Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6399867Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6399998Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6400351Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6400461Z getattr(self, test_name)() 2023-01-11T22:51:00.6400818Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6400915Z fn() 2023-01-11T22:51:00.6401279Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6401401Z test(self, **param_kwargs) 2023-01-11T22:51:00.6401752Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6401876Z return func(*args, **kwargs) 2023-01-11T22:51:00.6402151Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6402248Z self.run_subtests( 2023-01-11T22:51:00.6402595Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6402809Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6403176Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6403327Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6403699Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6403817Z output = model(*input) 2023-01-11T22:51:00.6404134Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6404254Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6404626Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6404797Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6405161Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6405279Z _lazy_init(state, module) 2023-01-11T22:51:00.6405686Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6405857Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6406256Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6406380Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6406715Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6406841Z return func(*args, **kwargs) 2023-01-11T22:51:00.6407212Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6407320Z p_assert( 2023-01-11T22:51:00.6407653Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6407778Z traceback.print_stack() 2023-01-11T22:51:00.6408024Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2023-01-11T22:51:00.6408247Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2023-01-11T22:51:00.6408641Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:51:00.6409387Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6409520Z File "", line 1, in 2023-01-11T22:51:00.6409732Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6409870Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6410071Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6410220Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6410429Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6410515Z self.run() 2023-01-11T22:51:00.6410715Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6410857Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6411199Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6411331Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6411694Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6411869Z getattr(self, test_name)() 2023-01-11T22:51:00.6412215Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6412315Z fn() 2023-01-11T22:51:00.6412675Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6412797Z test(self, **param_kwargs) 2023-01-11T22:51:00.6413152Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6413277Z return func(*args, **kwargs) 2023-01-11T22:51:00.6413549Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6413663Z self.run_subtests( 2023-01-11T22:51:00.6413994Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6414156Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6414558Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6414717Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6415094Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6415217Z output = model(*input) 2023-01-11T22:51:00.6415542Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6415681Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6416034Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6416205Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6416744Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6416876Z _lazy_init(state, module) 2023-01-11T22:51:00.6417240Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6417408Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6417803Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6417945Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6418279Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6418386Z return func(*args, **kwargs) 2023-01-11T22:51:00.6418759Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6418866Z p_assert( 2023-01-11T22:51:00.6419197Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6419325Z traceback.print_stack() 2023-01-11T22:51:00.6419726Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:51:00.6420466Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6420595Z File "", line 1, in 2023-01-11T22:51:00.6420804Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6420927Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6421211Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6421360Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6421575Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6421680Z self.run() 2023-01-11T22:51:00.6421884Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6422030Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6422354Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6422482Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6422841Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6422964Z getattr(self, test_name)() 2023-01-11T22:51:00.6423320Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6423422Z fn() 2023-01-11T22:51:00.6423784Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6423965Z test(self, **param_kwargs) 2023-01-11T22:51:00.6424317Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6424440Z return func(*args, **kwargs) 2023-01-11T22:51:00.6424714Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6424830Z self.run_subtests( 2023-01-11T22:51:00.6425181Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6425342Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6425706Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6425863Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6426222Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6426341Z output = model(*input) 2023-01-11T22:51:00.6426661Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6426800Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6427171Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6427345Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6427708Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6427827Z _lazy_init(state, module) 2023-01-11T22:51:00.6428164Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6428330Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6428726Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6428868Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6429204Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6429330Z return func(*args, **kwargs) 2023-01-11T22:51:00.6429704Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6429804Z p_assert( 2023-01-11T22:51:00.6430121Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6430247Z traceback.print_stack() 2023-01-11T22:51:00.6430546Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2023-01-11T22:51:00.6430790Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2023-01-11T22:51:00.6431191Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:51:00.6431933Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6432065Z File "", line 1, in 2023-01-11T22:51:00.6432272Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6432412Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6432600Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6432747Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6432999Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6433108Z self.run() 2023-01-11T22:51:00.6433309Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6433452Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6433797Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6433930Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6434270Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6434392Z getattr(self, test_name)() 2023-01-11T22:51:00.6434747Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6434850Z fn() 2023-01-11T22:51:00.6435210Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6435336Z test(self, **param_kwargs) 2023-01-11T22:51:00.6435691Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6435816Z return func(*args, **kwargs) 2023-01-11T22:51:00.6436073Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6436186Z self.run_subtests( 2023-01-11T22:51:00.6436533Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6436695Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6437055Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6437207Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6437580Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6437700Z output = model(*input) 2023-01-11T22:51:00.6438008Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6438141Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6438516Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6438693Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6439058Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6439176Z _lazy_init(state, module) 2023-01-11T22:51:00.6439579Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6439745Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6440126Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6440269Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6440605Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6440732Z return func(*args, **kwargs) 2023-01-11T22:51:00.6441106Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6441208Z p_assert( 2023-01-11T22:51:00.6441541Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6441668Z traceback.print_stack() 2023-01-11T22:51:00.6442047Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:51:00.6442834Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6442970Z File "", line 1, in 2023-01-11T22:51:00.6443178Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6443318Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6443519Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6443669Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6443882Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6443989Z self.run() 2023-01-11T22:51:00.6444172Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6444320Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6444658Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6444791Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6445150Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6445273Z getattr(self, test_name)() 2023-01-11T22:51:00.6445627Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6445708Z fn() 2023-01-11T22:51:00.6446069Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6446195Z test(self, **param_kwargs) 2023-01-11T22:51:00.6446550Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6446677Z return func(*args, **kwargs) 2023-01-11T22:51:00.6446952Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6447066Z self.run_subtests( 2023-01-11T22:51:00.6447418Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6447561Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6447920Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6448072Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6448443Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6448613Z output = model(*input) 2023-01-11T22:51:00.6448946Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6449083Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6449459Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6449615Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6449979Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6450100Z _lazy_init(state, module) 2023-01-11T22:51:00.6450450Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6450614Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6451010Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6451151Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6451531Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6451661Z return func(*args, **kwargs) 2023-01-11T22:51:00.6452022Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6452125Z p_assert( 2023-01-11T22:51:00.6452459Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6452590Z traceback.print_stack() 2023-01-11T22:51:00.6452833Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2023-01-11T22:51:00.6453067Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2023-01-11T22:51:00.6453468Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:51:00.6454215Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6454951Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6455689Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6456428Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6457340Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6457457Z File "", line 1, in 2023-01-11T22:51:00.6457668Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6457903Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6458106Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6458261Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6458472Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6465889Z self.run() 2023-01-11T22:51:00.6466144Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6466283Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6466657Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6466783Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6467139Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6467251Z getattr(self, test_name)() 2023-01-11T22:51:00.6467619Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6467706Z fn() 2023-01-11T22:51:00.6468185Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6468319Z test(self, **param_kwargs) 2023-01-11T22:51:00.6468667Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6468793Z return func(*args, **kwargs) 2023-01-11T22:51:00.6469070Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6469187Z self.run_subtests( 2023-01-11T22:51:00.6469540Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6469701Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6470068Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6470219Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6470577Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6470696Z output = model(*input) 2023-01-11T22:51:00.6471016Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6471151Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6471522Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6471692Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6472052Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6472176Z _lazy_init(state, module) 2023-01-11T22:51:00.6472511Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6472678Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6473071Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6473212Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6473545Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6473668Z return func(*args, **kwargs) 2023-01-11T22:51:00.6474040Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6474140Z p_assert( 2023-01-11T22:51:00.6474456Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6474641Z traceback.print_stack() 2023-01-11T22:51:00.6475042Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:51:00.6475785Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6476525Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6477322Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6478068Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6478797Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6478929Z File "", line 1, in 2023-01-11T22:51:00.6479142Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6479281Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6479485Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6479632Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6479826Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6479928Z self.run() 2023-01-11T22:51:00.6480127Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6480272Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6480683Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6480820Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6481187Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6481297Z getattr(self, test_name)() 2023-01-11T22:51:00.6481657Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6481751Z fn() 2023-01-11T22:51:00.6482110Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6482230Z test(self, **param_kwargs) 2023-01-11T22:51:00.6482579Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6482700Z return func(*args, **kwargs) 2023-01-11T22:51:00.6482972Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6483068Z self.run_subtests( 2023-01-11T22:51:00.6483415Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6483637Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6483999Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6484150Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6484520Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6484637Z output = model(*input) 2023-01-11T22:51:00.6484960Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6485081Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6485453Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6485624Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6485986Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6486101Z _lazy_init(state, module) 2023-01-11T22:51:00.6486493Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6486665Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6487058Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6487199Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6487519Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6487645Z return func(*args, **kwargs) 2023-01-11T22:51:00.6488021Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6488127Z p_assert( 2023-01-11T22:51:00.6488459Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6488585Z traceback.print_stack() 2023-01-11T22:51:00.6488828Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2023-01-11T22:51:00.6489062Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2023-01-11T22:51:00.6489441Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:51:00.6490177Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6490310Z File "", line 1, in 2023-01-11T22:51:00.6490521Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6490659Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6490863Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6491009Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6491219Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6491305Z self.run() 2023-01-11T22:51:00.6491501Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6491640Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6492037Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6492167Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6492527Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6492711Z getattr(self, test_name)() 2023-01-11T22:51:00.6493074Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6493155Z fn() 2023-01-11T22:51:00.6493515Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6493632Z test(self, **param_kwargs) 2023-01-11T22:51:00.6493980Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6494100Z return func(*args, **kwargs) 2023-01-11T22:51:00.6494376Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6494488Z self.run_subtests( 2023-01-11T22:51:00.6494837Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6494985Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6495390Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6495549Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6495922Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6496042Z output = model(*input) 2023-01-11T22:51:00.6496364Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6496497Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6497223Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6497381Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6497754Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6497872Z _lazy_init(state, module) 2023-01-11T22:51:00.6498224Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6498390Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6498782Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6498921Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6499255Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6499363Z return func(*args, **kwargs) 2023-01-11T22:51:00.6499736Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6499841Z p_assert( 2023-01-11T22:51:00.6500172Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6500298Z traceback.print_stack() 2023-01-11T22:51:00.6500694Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:51:00.6501436Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6501565Z File "", line 1, in 2023-01-11T22:51:00.6501773Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6501896Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6502191Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6502339Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6502552Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6502656Z self.run() 2023-01-11T22:51:00.6502859Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6503000Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6503341Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6503459Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6503816Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6503935Z getattr(self, test_name)() 2023-01-11T22:51:00.6504289Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6504387Z fn() 2023-01-11T22:51:00.6504742Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6504921Z test(self, **param_kwargs) 2023-01-11T22:51:00.6505286Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6505394Z return func(*args, **kwargs) 2023-01-11T22:51:00.6505669Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6505780Z self.run_subtests( 2023-01-11T22:51:00.6506129Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6506288Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6506647Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6506801Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6507173Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6507277Z output = model(*input) 2023-01-11T22:51:00.6507599Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6507734Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6508105Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6508273Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6508633Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6508750Z _lazy_init(state, module) 2023-01-11T22:51:00.6509099Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6509248Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6509645Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6509783Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6510115Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6510234Z return func(*args, **kwargs) 2023-01-11T22:51:00.6510610Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6510706Z p_assert( 2023-01-11T22:51:00.6511037Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6511200Z traceback.print_stack() 2023-01-11T22:51:00.6511447Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2023-01-11T22:51:00.6511688Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2023-01-11T22:51:00.6512088Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:51:00.6512834Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6512961Z File "", line 1, in 2023-01-11T22:51:00.6513167Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6513304Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6513508Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6513642Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6513891Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6513998Z self.run() 2023-01-11T22:51:00.6514196Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6514339Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6514679Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6514807Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6515149Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6515270Z getattr(self, test_name)() 2023-01-11T22:51:00.6515621Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6515721Z fn() 2023-01-11T22:51:00.6516084Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6516203Z test(self, **param_kwargs) 2023-01-11T22:51:00.6516554Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6516674Z return func(*args, **kwargs) 2023-01-11T22:51:00.6516931Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6517042Z self.run_subtests( 2023-01-11T22:51:00.6517387Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6517544Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6517904Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6518057Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6518430Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6518550Z output = model(*input) 2023-01-11T22:51:00.6518858Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6518993Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6519361Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6519532Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6519889Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6520005Z _lazy_init(state, module) 2023-01-11T22:51:00.6520410Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6520580Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6520972Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6521097Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6521427Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6521549Z return func(*args, **kwargs) 2023-01-11T22:51:00.6521924Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6522025Z p_assert( 2023-01-11T22:51:00.6522353Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6522482Z traceback.print_stack() 2023-01-11T22:51:00.6522875Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:51:00.6523649Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6523787Z File "", line 1, in 2023-01-11T22:51:00.6523999Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6524139Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6524339Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6524489Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6524697Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6524803Z self.run() 2023-01-11T22:51:00.6524986Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6525129Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6525470Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6525601Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6525954Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6526076Z getattr(self, test_name)() 2023-01-11T22:51:00.6526433Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6526530Z fn() 2023-01-11T22:51:00.6526876Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6527001Z test(self, **param_kwargs) 2023-01-11T22:51:00.6527349Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6527476Z return func(*args, **kwargs) 2023-01-11T22:51:00.6527748Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6527857Z self.run_subtests( 2023-01-11T22:51:00.6528205Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6528359Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6528702Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6528852Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6529222Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6529422Z output = model(*input) 2023-01-11T22:51:00.6529754Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6529891Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6530262Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6530430Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6530774Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6530892Z _lazy_init(state, module) 2023-01-11T22:51:00.6531238Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6531403Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6531800Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6531982Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6532330Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6532456Z return func(*args, **kwargs) 2023-01-11T22:51:00.6532815Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6532916Z p_assert( 2023-01-11T22:51:00.6533248Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6533370Z traceback.print_stack() 2023-01-11T22:51:00.6533614Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2023-01-11T22:51:00.6533847Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2023-01-11T22:51:00.6534242Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:51:00.6534984Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6535112Z File "", line 1, in 2023-01-11T22:51:00.6535305Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6535445Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6535641Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6535785Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6535997Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6536098Z self.run() 2023-01-11T22:51:00.6536299Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6536442Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6536967Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6537104Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6537470Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6537591Z getattr(self, test_name)() 2023-01-11T22:51:00.6537943Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6538042Z fn() 2023-01-11T22:51:00.6538400Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6538610Z test(self, **param_kwargs) 2023-01-11T22:51:00.6538953Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6539081Z return func(*args, **kwargs) 2023-01-11T22:51:00.6539357Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6539469Z self.run_subtests( 2023-01-11T22:51:00.6539820Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6539980Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6540339Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6540487Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6540843Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6540964Z output = model(*input) 2023-01-11T22:51:00.6541359Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6541505Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6541880Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6542049Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6542413Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6542534Z _lazy_init(state, module) 2023-01-11T22:51:00.6542867Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6543032Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6543431Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6543575Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6543910Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6544031Z return func(*args, **kwargs) 2023-01-11T22:51:00.6544402Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6544502Z p_assert( 2023-01-11T22:51:00.6544821Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6544942Z traceback.print_stack() 2023-01-11T22:51:00.6545334Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:51:00.6546082Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6546212Z File "", line 1, in 2023-01-11T22:51:00.6546415Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6546556Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6546754Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6546903Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6547097Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6547199Z self.run() 2023-01-11T22:51:00.6547398Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6547597Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6547941Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6548074Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6548431Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6548536Z getattr(self, test_name)() 2023-01-11T22:51:00.6548887Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6548982Z fn() 2023-01-11T22:51:00.6549341Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6549457Z test(self, **param_kwargs) 2023-01-11T22:51:00.6549809Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6549935Z return func(*args, **kwargs) 2023-01-11T22:51:00.6550249Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:51:00.6550349Z self.run_subtests( 2023-01-11T22:51:00.6550701Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6550860Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6551215Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6551365Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6551732Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6551851Z output = model(*input) 2023-01-11T22:51:00.6552176Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6552299Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6552675Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6552847Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6553202Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6553318Z _lazy_init(state, module) 2023-01-11T22:51:00.6553662Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6553824Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6554211Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6554353Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6554671Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6554791Z return func(*args, **kwargs) 2023-01-11T22:51:00.6555168Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6555266Z p_assert( 2023-01-11T22:51:00.6555601Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6555726Z traceback.print_stack() 2023-01-11T22:51:00.6555970Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2023-01-11T22:51:00.6556209Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2023-01-11T22:51:00.6556592Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:51:00.6557396Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6557789Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:51:00.6558513Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6558750Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2023-01-11T22:51:00.6558984Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2023-01-11T22:51:00.6559371Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:51:00.6559800Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:51:00.6560045Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2023-01-11T22:51:00.6560277Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2023-01-11T22:51:00.6560671Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:51:00.6561044Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:51:00.6561277Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2023-01-11T22:51:00.6561508Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2023-01-11T22:51:00.6561895Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:51:00.6562282Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:51:00.6562513Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2023-01-11T22:51:00.6562742Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2023-01-11T22:51:00.6563126Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:51:00.6563510Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:51:00.6563733Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2023-01-11T22:51:00.6563965Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2023-01-11T22:51:00.6564350Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:51:00.6564733Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:51:00.6564960Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 0 2023-01-11T22:51:00.6565188Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 1 2023-01-11T22:51:00.6565573Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:51:00.6566009Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:51:00.6566748Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6566986Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 0 2023-01-11T22:51:00.6567202Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 1 2023-01-11T22:51:00.6567587Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:51:00.6567969Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:51:00.6568750Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6569495Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6570218Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6570955Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6571675Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6572399Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6573121Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6573848Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6574568Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6575344Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6576065Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6576968Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6577781Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6578520Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6579237Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6579958Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6580674Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6581387Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6582105Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6582821Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6583536Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6584325Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6585044Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6585760Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6586003Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 0 2023-01-11T22:51:00.6586279Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 1 2023-01-11T22:51:00.6586678Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:51:00.6587065Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:51:00.6587788Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6588800Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py:451: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:00.6588936Z shapes.append(param.shape) 2023-01-11T22:51:00.6589160Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 0 2023-01-11T22:51:00.6589393Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 1 2023-01-11T22:51:00.6589785Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:51:00.6590171Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:51:00.6590902Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6591142Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 0 2023-01-11T22:51:00.6591373Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 1 2023-01-11T22:51:00.6591758Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:51:00.6592198Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:51:00.6592932Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6593238Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 0 2023-01-11T22:51:00.6593454Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 1 2023-01-11T22:51:00.6593849Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:51:00.6594236Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:51:00.6594968Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6595211Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:26 to store for rank: 0 2023-01-11T22:51:00.6595487Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:26 to store for rank: 1 2023-01-11T22:51:00.6595883Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:26 with 2 nodes. 2023-01-11T22:51:00.6596267Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:26 with 2 nodes. 2023-01-11T22:51:00.6596993Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6597233Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:27 to store for rank: 0 2023-01-11T22:51:00.6597458Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:27 to store for rank: 1 2023-01-11T22:51:00.6597834Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:27 with 2 nodes. 2023-01-11T22:51:00.6598216Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:27 with 2 nodes. 2023-01-11T22:51:00.6598944Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6599181Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:28 to store for rank: 0 2023-01-11T22:51:00.6599413Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:28 to store for rank: 1 2023-01-11T22:51:00.6599798Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:28 with 2 nodes. 2023-01-11T22:51:00.6600184Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:28 with 2 nodes. 2023-01-11T22:51:00.6600911Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6601142Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:29 to store for rank: 0 2023-01-11T22:51:00.6601372Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:29 to store for rank: 1 2023-01-11T22:51:00.6601808Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:29 with 2 nodes. 2023-01-11T22:51:00.6602181Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:29 with 2 nodes. 2023-01-11T22:51:00.6602909Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6603146Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:30 to store for rank: 0 2023-01-11T22:51:00.6603373Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:30 to store for rank: 1 2023-01-11T22:51:00.6603756Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:30 with 2 nodes. 2023-01-11T22:51:00.6604142Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:30 with 2 nodes. 2023-01-11T22:51:00.6604929Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6605179Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:31 to store for rank: 0 2023-01-11T22:51:00.6605412Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:31 to store for rank: 1 2023-01-11T22:51:00.6605799Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:31 with 2 nodes. 2023-01-11T22:51:00.6606185Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:31 with 2 nodes. 2023-01-11T22:51:00.6606913Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6607137Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:32 to store for rank: 0 2023-01-11T22:51:00.6607368Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:32 to store for rank: 1 2023-01-11T22:51:00.6607757Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:32 with 2 nodes. 2023-01-11T22:51:00.6608141Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:32 with 2 nodes. 2023-01-11T22:51:00.6608874Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6609111Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:33 to store for rank: 0 2023-01-11T22:51:00.6609336Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:33 to store for rank: 1 2023-01-11T22:51:00.6609722Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:33 with 2 nodes. 2023-01-11T22:51:00.6610106Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:33 with 2 nodes. 2023-01-11T22:51:00.6610833Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6611619Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6612348Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6613075Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6613843Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6614564Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6615282Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6616006Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6616923Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6617663Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6618393Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6619117Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6619838Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6620653Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6621369Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6622091Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6622865Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6623580Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6624302Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6625028Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6625745Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6626464Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6627185Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6627904Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6628615Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6629387Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6630107Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6630826Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6631118Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:34 to store for rank: 0 2023-01-11T22:51:00.6631361Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:34 to store for rank: 1 2023-01-11T22:51:00.6631763Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:34 with 2 nodes. 2023-01-11T22:51:00.6632481Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6632868Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:34 with 2 nodes. 2023-01-11T22:51:00.6633590Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6633829Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:35 to store for rank: 0 2023-01-11T22:51:00.6634062Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:35 to store for rank: 1 2023-01-11T22:51:00.6634447Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:35 with 2 nodes. 2023-01-11T22:51:00.6635159Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6635547Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:35 with 2 nodes. 2023-01-11T22:51:00.6636260Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6636480Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:36 to store for rank: 0 2023-01-11T22:51:00.6636706Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:36 to store for rank: 1 2023-01-11T22:51:00.6637088Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:36 with 2 nodes. 2023-01-11T22:51:00.6637862Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6638244Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:36 with 2 nodes. 2023-01-11T22:51:00.6638953Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6639182Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:37 to store for rank: 0 2023-01-11T22:51:00.6639413Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:37 to store for rank: 1 2023-01-11T22:51:00.6639792Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:37 with 2 nodes. 2023-01-11T22:51:00.6640555Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6640945Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:37 with 2 nodes. 2023-01-11T22:51:00.6641652Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6641766Z dist init r=1, world=2 2023-01-11T22:51:00.6642078Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.6642390Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.6642697Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.6642999Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.6643300Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.6643600Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.6643902Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.6644197Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.6644494Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.6644789Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.6645084Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.6645437Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.6645533Z dist init r=0, world=2 2023-01-11T22:51:00.6645852Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.6646160Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.6646464Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.6646764Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.6647107Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.6647411Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.6647710Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.6648005Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.6648301Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.6648604Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.6648887Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.6649179Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.6649277Z ok (31.254s) 2023-01-11T22:51:00.6649612Z test_nested_always_wrap_model_offload_false_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 94360 2023-01-11T22:51:00.6649826Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 94361 2023-01-11T22:51:00.6650201Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.6650377Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.6650759Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.6650946Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.6651293Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.6651465Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.6651832Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.6652018Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.6652255Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.6652549Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.6652954Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.6653347Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.6653572Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.6653780Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.6654012Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6654244Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6655293Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.6655415Z warnings.warn( 2023-01-11T22:51:00.6656421Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.6656686Z warnings.warn( 2023-01-11T22:51:00.6656942Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6657169Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6657396Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6657625Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6657835Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6658062Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6658285Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6658511Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6658731Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6658959Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6659178Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6659401Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6660158Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6660890Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6661703Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6662438Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6663162Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6663988Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6664730Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6665456Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6666188Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6666911Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6667631Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6668358Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6669073Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6669794Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6670572Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6671292Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6672014Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6672778Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6673501Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6674220Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6674946Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6675665Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6676381Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6677093Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6677808Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6678523Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6679293Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6680005Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6680725Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6681483Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6681723Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6681955Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6682183Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6682412Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6682636Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6682860Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6683074Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6683300Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6683526Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6683747Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6683965Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6684183Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6684403Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6684621Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6684826Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6685049Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6685273Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6685491Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6686229Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6686954Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6687742Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6688465Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6689182Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6689943Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6690676Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6691395Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6692170Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6692893Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6693609Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6694328Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6695043Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6695759Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6696695Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6697434Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6698151Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6698948Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6699682Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6700400Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6701127Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6701843Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6702558Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6703277Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6703994Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6704709Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6705496Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6706217Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6706932Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6707689Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6708411Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6709128Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6709849Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6710562Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6711278Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6711996Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6712711Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6713424Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6714193Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6714907Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.6715137Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6715364Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6715577Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6715804Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6716027Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6716290Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6716515Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6716737Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6716963Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6717189Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6717394Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6717617Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6717842Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6718065Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6718286Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6718504Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6718610Z dist init r=1, world=2 2023-01-11T22:51:00.6718717Z dist init r=0, world=2 2023-01-11T22:51:00.6718801Z ok (6.013s) 2023-01-11T22:51:00.6719130Z test_nested_always_wrap_model_offload_false_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 94443 2023-01-11T22:51:00.6719343Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 94444 2023-01-11T22:51:00.6719718Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.6719896Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.6720272Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.6720462Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.6720823Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.6720979Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.6721349Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.6721534Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.6721772Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.6722073Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.6722473Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.6722863Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.6723086Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.6723309Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.6723523Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6723751Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6724815Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.6724936Z warnings.warn( 2023-01-11T22:51:00.6725945Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.6726054Z warnings.warn( 2023-01-11T22:51:00.6726274Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6726504Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6726730Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6726960Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6727181Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6727389Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6727613Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6727835Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6728055Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6728277Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6728501Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6728723Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6728943Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6729148Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6729365Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6729585Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6729807Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6730025Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6730296Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6730518Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6730738Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6730959Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6731162Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6731385Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6731600Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6731819Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6732036Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6732259Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6732518Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6732742Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6732945Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6733165Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6733383Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6733603Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6733822Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6734040Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6734263Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6734488Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6734691Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6734910Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6735128Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6735346Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6735563Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6735781Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6735998Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6736221Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6736334Z dist init r=1, world=2 2023-01-11T22:51:00.6736430Z dist init r=0, world=2 2023-01-11T22:51:00.6736683Z ok (6.315s) 2023-01-11T22:51:00.6737047Z test_nested_always_wrap_model_offload_false_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 94526 2023-01-11T22:51:00.6737264Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 94527 2023-01-11T22:51:00.6737646Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.6737818Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.6738190Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.6738461Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.6738818Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.6738989Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.6739363Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.6739550Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.6739790Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.6740030Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.6740423Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.6740814Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.6741091Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.6741309Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.6741540Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6741770Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6742775Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.6742884Z warnings.warn( 2023-01-11T22:51:00.6743886Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.6743996Z warnings.warn( 2023-01-11T22:51:00.6744224Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6744448Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6744673Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6744889Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6745110Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6745331Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6745551Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6745769Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6745987Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6746207Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6746427Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6746648Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6746905Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6747130Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6747350Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6747571Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6747789Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6748007Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6748228Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6748446Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6748652Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6748875Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6749143Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6749367Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6749588Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6749805Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6750024Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6750242Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6750464Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6750665Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6750889Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6751105Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6751324Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6751546Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6751764Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6751981Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6752201Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6752402Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6752626Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6752843Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6753064Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6753282Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6753500Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6753718Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6753937Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6754139Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6754252Z dist init r=0, world=2 2023-01-11T22:51:00.6754358Z dist init r=1, world=2 2023-01-11T22:51:00.6754508Z ok (6.315s) 2023-01-11T22:51:00.6754846Z test_nested_always_wrap_model_offload_true_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 94609 2023-01-11T22:51:00.6755059Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 94610 2023-01-11T22:51:00.6755439Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.6755612Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.6755971Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.6756158Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.6756518Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.6756691Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.6757063Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.6757294Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.6757541Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.6757781Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.6758177Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.6758549Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.6758772Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.6758999Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.6759227Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6759455Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6760456Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.6760567Z warnings.warn( 2023-01-11T22:51:00.6761570Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.6761681Z warnings.warn( 2023-01-11T22:51:00.6761809Z File "", line 1, in 2023-01-11T22:51:00.6762017Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6762142Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6762341Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6762483Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6762689Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6762791Z self.run() 2023-01-11T22:51:00.6762989Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6763188Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6763517Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6763648Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6764005Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6764124Z getattr(self, test_name)() 2023-01-11T22:51:00.6764476Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6764572Z fn() 2023-01-11T22:51:00.6764928Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6765048Z test(self, **param_kwargs) 2023-01-11T22:51:00.6765385Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6765513Z return func(*args, **kwargs) 2023-01-11T22:51:00.6765808Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.6765924Z self.run_subtests( 2023-01-11T22:51:00.6766277Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6766435Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6766796Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6766946Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6767304Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6767418Z output = model(*input) 2023-01-11T22:51:00.6767738Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6767874Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6768249Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6768421Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6768784Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6768903Z _lazy_init(state, module) 2023-01-11T22:51:00.6769237Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6769402Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6769795Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6769937Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6770275Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6770400Z return func(*args, **kwargs) 2023-01-11T22:51:00.6770772Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6770876Z p_assert( 2023-01-11T22:51:00.6771193Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6771317Z traceback.print_stack() 2023-01-11T22:51:00.6771444Z File "", line 1, in 2023-01-11T22:51:00.6771645Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6771785Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6771984Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6772194Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6772403Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6772489Z self.run() 2023-01-11T22:51:00.6772692Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6772834Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6773171Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6773304Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6773659Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6773781Z getattr(self, test_name)() 2023-01-11T22:51:00.6774135Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6774217Z fn() 2023-01-11T22:51:00.6774580Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6774699Z test(self, **param_kwargs) 2023-01-11T22:51:00.6775095Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6775222Z return func(*args, **kwargs) 2023-01-11T22:51:00.6775473Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.6775587Z self.run_subtests( 2023-01-11T22:51:00.6775938Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6776082Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6776440Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6776820Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6777221Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6777338Z output = model(*input) 2023-01-11T22:51:00.6777664Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6777800Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6778170Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6778327Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6778690Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6778808Z _lazy_init(state, module) 2023-01-11T22:51:00.6779155Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6779319Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6779713Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6779855Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6780188Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6780296Z return func(*args, **kwargs) 2023-01-11T22:51:00.6780717Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6780821Z p_assert( 2023-01-11T22:51:00.6781158Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6781281Z traceback.print_stack() 2023-01-11T22:51:00.6781513Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6781834Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6781961Z File "", line 1, in 2023-01-11T22:51:00.6782156Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6782296Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6782491Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6782636Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6782841Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6782938Z self.run() 2023-01-11T22:51:00.6783131Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6783259Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6783602Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6783736Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6784093Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6784286Z getattr(self, test_name)() 2023-01-11T22:51:00.6784657Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6784751Z fn() 2023-01-11T22:51:00.6785107Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6785212Z test(self, **param_kwargs) 2023-01-11T22:51:00.6785561Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6785685Z return func(*args, **kwargs) 2023-01-11T22:51:00.6785938Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.6786053Z self.run_subtests( 2023-01-11T22:51:00.6786403Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6786562Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6786921Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6787055Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6787427Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6787545Z output = model(*input) 2023-01-11T22:51:00.6787868Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6788002Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6788369Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6788544Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6788909Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6789013Z _lazy_init(state, module) 2023-01-11T22:51:00.6789362Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6789525Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6789918Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6790057Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6790395Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6790514Z return func(*args, **kwargs) 2023-01-11T22:51:00.6790947Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6791033Z p_assert( 2023-01-11T22:51:00.6791373Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6791497Z traceback.print_stack() 2023-01-11T22:51:00.6791622Z File "", line 1, in 2023-01-11T22:51:00.6791869Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6792012Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6792206Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6792351Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6792546Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6792649Z self.run() 2023-01-11T22:51:00.6792854Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6792994Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6793377Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6793512Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6793872Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6793991Z getattr(self, test_name)() 2023-01-11T22:51:00.6794331Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6794428Z fn() 2023-01-11T22:51:00.6794785Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6794905Z test(self, **param_kwargs) 2023-01-11T22:51:00.6795253Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6795380Z return func(*args, **kwargs) 2023-01-11T22:51:00.6795628Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.6795725Z self.run_subtests( 2023-01-11T22:51:00.6796072Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6796226Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6796584Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6796735Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6797103Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6797216Z output = model(*input) 2023-01-11T22:51:00.6797536Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6797656Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6798027Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6798196Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6798553Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6798668Z _lazy_init(state, module) 2023-01-11T22:51:00.6799014Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6799173Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6799565Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6799789Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6800111Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6800234Z return func(*args, **kwargs) 2023-01-11T22:51:00.6800606Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6800702Z p_assert( 2023-01-11T22:51:00.6801031Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6801156Z traceback.print_stack() 2023-01-11T22:51:00.6801387Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6801618Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6801730Z File "", line 1, in 2023-01-11T22:51:00.6801933Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6802076Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6802315Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6802469Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6802674Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6802776Z self.run() 2023-01-11T22:51:00.6802958Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6803099Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6803440Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6803571Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6803923Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6804047Z getattr(self, test_name)() 2023-01-11T22:51:00.6804398Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6804493Z fn() 2023-01-11T22:51:00.6804839Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6804957Z test(self, **param_kwargs) 2023-01-11T22:51:00.6805307Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6805429Z return func(*args, **kwargs) 2023-01-11T22:51:00.6805681Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.6805791Z self.run_subtests( 2023-01-11T22:51:00.6806137Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6806297Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6806640Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6806791Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6807162Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6807277Z output = model(*input) 2023-01-11T22:51:00.6807599Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6807734Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6808108Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6808279Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6808625Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6808805Z _lazy_init(state, module) 2023-01-11T22:51:00.6809156Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6809319Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6809712Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6809848Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6810179Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6810295Z return func(*args, **kwargs) 2023-01-11T22:51:00.6810650Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6810746Z p_assert( 2023-01-11T22:51:00.6811079Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6811201Z traceback.print_stack() 2023-01-11T22:51:00.6811320Z File "", line 1, in 2023-01-11T22:51:00.6811569Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6811714Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6811907Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6812039Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6812244Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6812344Z self.run() 2023-01-11T22:51:00.6812538Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6812680Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6813014Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6813150Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6813495Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6813612Z getattr(self, test_name)() 2023-01-11T22:51:00.6813967Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6814060Z fn() 2023-01-11T22:51:00.6814415Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6814533Z test(self, **param_kwargs) 2023-01-11T22:51:00.6814884Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6815003Z return func(*args, **kwargs) 2023-01-11T22:51:00.6815238Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.6815349Z self.run_subtests( 2023-01-11T22:51:00.6815699Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6815857Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6816213Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6816359Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6816908Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6817031Z output = model(*input) 2023-01-11T22:51:00.6817343Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6817481Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6817854Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6818108Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6818479Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6818598Z _lazy_init(state, module) 2023-01-11T22:51:00.6818948Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6819112Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6819504Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6819628Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6819953Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6820076Z return func(*args, **kwargs) 2023-01-11T22:51:00.6820447Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6820607Z p_assert( 2023-01-11T22:51:00.6820954Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6821076Z traceback.print_stack() 2023-01-11T22:51:00.6821293Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6821527Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6821652Z File "", line 1, in 2023-01-11T22:51:00.6821858Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6821997Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6822194Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6822343Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6822549Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6822637Z self.run() 2023-01-11T22:51:00.6822830Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6822971Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6823309Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6823437Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6823790Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6823908Z getattr(self, test_name)() 2023-01-11T22:51:00.6824259Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6824340Z fn() 2023-01-11T22:51:00.6824702Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6824822Z test(self, **param_kwargs) 2023-01-11T22:51:00.6825173Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6825292Z return func(*args, **kwargs) 2023-01-11T22:51:00.6825539Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.6825648Z self.run_subtests( 2023-01-11T22:51:00.6825995Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6826138Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6826496Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6826828Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6827202Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6827319Z output = model(*input) 2023-01-11T22:51:00.6827643Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6827781Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6828147Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6828303Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6828665Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6828780Z _lazy_init(state, module) 2023-01-11T22:51:00.6829127Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6829294Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6829730Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6829875Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6830212Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6830320Z return func(*args, **kwargs) 2023-01-11T22:51:00.6830694Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6830794Z p_assert( 2023-01-11T22:51:00.6831124Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6831246Z traceback.print_stack() 2023-01-11T22:51:00.6831373Z File "", line 1, in 2023-01-11T22:51:00.6831579Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6831716Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6831902Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6832045Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6832250Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6832350Z self.run() 2023-01-11T22:51:00.6832548Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6832689Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6833022Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6833137Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6833492Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6833614Z getattr(self, test_name)() 2023-01-11T22:51:00.6833967Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6834066Z fn() 2023-01-11T22:51:00.6834426Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6834544Z test(self, **param_kwargs) 2023-01-11T22:51:00.6834893Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6835000Z return func(*args, **kwargs) 2023-01-11T22:51:00.6835252Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.6835358Z self.run_subtests( 2023-01-11T22:51:00.6835702Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6835916Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6836280Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6836429Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6836803Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6836904Z output = model(*input) 2023-01-11T22:51:00.6837226Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6837361Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6837730Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6837899Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6838258Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6838380Z _lazy_init(state, module) 2023-01-11T22:51:00.6838774Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6838931Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6839326Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6839465Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6839798Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6839917Z return func(*args, **kwargs) 2023-01-11T22:51:00.6840285Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6840391Z p_assert( 2023-01-11T22:51:00.6840722Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6840831Z traceback.print_stack() 2023-01-11T22:51:00.6841067Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6841300Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6841425Z File "", line 1, in 2023-01-11T22:51:00.6841630Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6841769Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6841962Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6842106Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6842301Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6842400Z self.run() 2023-01-11T22:51:00.6842599Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6842738Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6843074Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6843198Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6843547Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6843663Z getattr(self, test_name)() 2023-01-11T22:51:00.6843999Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6844091Z fn() 2023-01-11T22:51:00.6844449Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6844569Z test(self, **param_kwargs) 2023-01-11T22:51:00.6844980Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6845106Z return func(*args, **kwargs) 2023-01-11T22:51:00.6845358Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.6845469Z self.run_subtests( 2023-01-11T22:51:00.6845803Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6845958Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6846314Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6846462Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6846828Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6846942Z output = model(*input) 2023-01-11T22:51:00.6847264Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6847396Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6847811Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6847987Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6848354Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6848469Z _lazy_init(state, module) 2023-01-11T22:51:00.6848816Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6848980Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6849370Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6849512Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6849833Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6849957Z return func(*args, **kwargs) 2023-01-11T22:51:00.6850331Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6850430Z p_assert( 2023-01-11T22:51:00.6850759Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6850880Z traceback.print_stack() 2023-01-11T22:51:00.6851005Z File "", line 1, in 2023-01-11T22:51:00.6851205Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6851330Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6851527Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6851677Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6851880Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6851980Z self.run() 2023-01-11T22:51:00.6852176Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6852316Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6852638Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6852770Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6853124Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6853244Z getattr(self, test_name)() 2023-01-11T22:51:00.6853595Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6853742Z fn() 2023-01-11T22:51:00.6854103Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6854226Z test(self, **param_kwargs) 2023-01-11T22:51:00.6854561Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6854680Z return func(*args, **kwargs) 2023-01-11T22:51:00.6854931Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.6855041Z self.run_subtests( 2023-01-11T22:51:00.6855384Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6855542Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6855899Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6856048Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6856403Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6856802Z output = model(*input) 2023-01-11T22:51:00.6857165Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6857303Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6857673Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6857839Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6858202Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6858320Z _lazy_init(state, module) 2023-01-11T22:51:00.6858653Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6858820Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6859216Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6859352Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6859684Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6859805Z return func(*args, **kwargs) 2023-01-11T22:51:00.6860174Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6860268Z p_assert( 2023-01-11T22:51:00.6860581Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6860704Z traceback.print_stack() 2023-01-11T22:51:00.6860936Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6861168Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6861295Z File "", line 1, in 2023-01-11T22:51:00.6861501Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6861638Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6861830Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6861962Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6862163Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6862260Z self.run() 2023-01-11T22:51:00.6862455Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6862593Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6862928Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6863137Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6863502Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6863607Z getattr(self, test_name)() 2023-01-11T22:51:00.6863962Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6864054Z fn() 2023-01-11T22:51:00.6864409Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6864530Z test(self, **param_kwargs) 2023-01-11T22:51:00.6864876Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6864992Z return func(*args, **kwargs) 2023-01-11T22:51:00.6865227Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.6865343Z self.run_subtests( 2023-01-11T22:51:00.6865747Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6865913Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6866273Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6866423Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6866795Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6866912Z output = model(*input) 2023-01-11T22:51:00.6867227Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6867348Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6867721Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6867891Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6868254Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6868372Z _lazy_init(state, module) 2023-01-11T22:51:00.6868718Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6868880Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6869273Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6869398Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6869731Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6869854Z return func(*args, **kwargs) 2023-01-11T22:51:00.6870221Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6870318Z p_assert( 2023-01-11T22:51:00.6870650Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6870772Z traceback.print_stack() 2023-01-11T22:51:00.6870895Z File "", line 1, in 2023-01-11T22:51:00.6871083Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6871219Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6871413Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6871559Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6871765Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6871921Z self.run() 2023-01-11T22:51:00.6872119Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6872247Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6872588Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6872715Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6873071Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6873189Z getattr(self, test_name)() 2023-01-11T22:51:00.6873539Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6873630Z fn() 2023-01-11T22:51:00.6873984Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6874089Z test(self, **param_kwargs) 2023-01-11T22:51:00.6874442Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6874561Z return func(*args, **kwargs) 2023-01-11T22:51:00.6874855Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.6874970Z self.run_subtests( 2023-01-11T22:51:00.6875313Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6875470Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6875824Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6875957Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6876320Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6876439Z output = model(*input) 2023-01-11T22:51:00.6876758Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6876896Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6877260Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6877428Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6877788Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6877890Z _lazy_init(state, module) 2023-01-11T22:51:00.6878237Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6878397Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6878789Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6878928Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6879263Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6879383Z return func(*args, **kwargs) 2023-01-11T22:51:00.6879757Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6879841Z p_assert( 2023-01-11T22:51:00.6880169Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6880291Z traceback.print_stack() 2023-01-11T22:51:00.6880519Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6880749Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6880870Z File "", line 1, in 2023-01-11T22:51:00.6881130Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6881270Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6881457Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6881599Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6881802Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6881905Z self.run() 2023-01-11T22:51:00.6882101Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6882243Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6882582Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6882712Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6883052Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6883175Z getattr(self, test_name)() 2023-01-11T22:51:00.6883576Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6883677Z fn() 2023-01-11T22:51:00.6884039Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6884159Z test(self, **param_kwargs) 2023-01-11T22:51:00.6884512Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6884620Z return func(*args, **kwargs) 2023-01-11T22:51:00.6884873Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.6884980Z self.run_subtests( 2023-01-11T22:51:00.6885321Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6885479Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6885844Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6885991Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6886362Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6886465Z output = model(*input) 2023-01-11T22:51:00.6886786Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6886919Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6887289Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6887456Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6887819Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6887936Z _lazy_init(state, module) 2023-01-11T22:51:00.6888284Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6888445Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6888823Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6888961Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6889290Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6889412Z return func(*args, **kwargs) 2023-01-11T22:51:00.6889780Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6889932Z p_assert( 2023-01-11T22:51:00.6890263Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6890384Z traceback.print_stack() 2023-01-11T22:51:00.6890499Z File "", line 1, in 2023-01-11T22:51:00.6890705Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6890842Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6891040Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6891188Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6891398Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6891500Z self.run() 2023-01-11T22:51:00.6891684Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6891826Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6892215Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6892345Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6892758Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6892882Z getattr(self, test_name)() 2023-01-11T22:51:00.6893240Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6893335Z fn() 2023-01-11T22:51:00.6893677Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6893800Z test(self, **param_kwargs) 2023-01-11T22:51:00.6894152Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6894275Z return func(*args, **kwargs) 2023-01-11T22:51:00.6894533Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.6894646Z self.run_subtests( 2023-01-11T22:51:00.6894992Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6895149Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6895494Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6895637Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6896002Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6896116Z output = model(*input) 2023-01-11T22:51:00.6896433Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6896789Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6897187Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6897364Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6897713Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6897831Z _lazy_init(state, module) 2023-01-11T22:51:00.6898179Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6898340Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6898729Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6898864Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6899193Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6899398Z return func(*args, **kwargs) 2023-01-11T22:51:00.6899764Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6899863Z p_assert( 2023-01-11T22:51:00.6900194Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6900319Z traceback.print_stack() 2023-01-11T22:51:00.6900556Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6900789Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6900913Z File "", line 1, in 2023-01-11T22:51:00.6901114Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6901238Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6901441Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6901587Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6901851Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6901960Z self.run() 2023-01-11T22:51:00.6902157Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6902300Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6902622Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6902755Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6903111Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6903230Z getattr(self, test_name)() 2023-01-11T22:51:00.6903586Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6903685Z fn() 2023-01-11T22:51:00.6904041Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6904162Z test(self, **param_kwargs) 2023-01-11T22:51:00.6904496Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6904620Z return func(*args, **kwargs) 2023-01-11T22:51:00.6904872Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.6904984Z self.run_subtests( 2023-01-11T22:51:00.6905326Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6905484Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6905838Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6905989Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6906345Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6906460Z output = model(*input) 2023-01-11T22:51:00.6906776Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6906911Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6907280Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6907450Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6907805Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6907919Z _lazy_init(state, module) 2023-01-11T22:51:00.6908252Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6908467Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6908867Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6909008Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6909341Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6909464Z return func(*args, **kwargs) 2023-01-11T22:51:00.6909837Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6909935Z p_assert( 2023-01-11T22:51:00.6910260Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6910368Z traceback.print_stack() 2023-01-11T22:51:00.6910493Z File "", line 1, in 2023-01-11T22:51:00.6910696Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6910891Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6911095Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6911243Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6911452Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6911538Z self.run() 2023-01-11T22:51:00.6911736Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6911881Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6912220Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6912348Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6912705Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6912827Z getattr(self, test_name)() 2023-01-11T22:51:00.6913175Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6913256Z fn() 2023-01-11T22:51:00.6913618Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6913738Z test(self, **param_kwargs) 2023-01-11T22:51:00.6914086Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6914205Z return func(*args, **kwargs) 2023-01-11T22:51:00.6914454Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.6914559Z self.run_subtests( 2023-01-11T22:51:00.6914906Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6915054Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6915414Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6915563Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6915934Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6916047Z output = model(*input) 2023-01-11T22:51:00.6916367Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6916501Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6916869Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6917026Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6917449Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6917568Z _lazy_init(state, module) 2023-01-11T22:51:00.6917919Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6918080Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6918472Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6918613Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6918944Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6919051Z return func(*args, **kwargs) 2023-01-11T22:51:00.6919423Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6919525Z p_assert( 2023-01-11T22:51:00.6919859Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6920023Z traceback.print_stack() 2023-01-11T22:51:00.6920264Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6920497Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6920621Z File "", line 1, in 2023-01-11T22:51:00.6920812Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6920949Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6921145Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6921291Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6921496Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6921600Z self.run() 2023-01-11T22:51:00.6921794Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6921925Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6922262Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6922387Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6922738Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6922858Z getattr(self, test_name)() 2023-01-11T22:51:00.6923210Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6923304Z fn() 2023-01-11T22:51:00.6923659Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6923767Z test(self, **param_kwargs) 2023-01-11T22:51:00.6924117Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6924238Z return func(*args, **kwargs) 2023-01-11T22:51:00.6924489Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.6924598Z self.run_subtests( 2023-01-11T22:51:00.6924940Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6925096Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6925452Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6925585Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6925953Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6926116Z output = model(*input) 2023-01-11T22:51:00.6926434Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6926573Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6926945Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6927114Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6927475Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6927578Z _lazy_init(state, module) 2023-01-11T22:51:00.6927923Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6928085Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6928469Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6928607Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6928980Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6929109Z return func(*args, **kwargs) 2023-01-11T22:51:00.6929484Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6929569Z p_assert( 2023-01-11T22:51:00.6929903Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6930024Z traceback.print_stack() 2023-01-11T22:51:00.6930149Z File "", line 1, in 2023-01-11T22:51:00.6930351Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6930488Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6930690Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6930839Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6931035Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6931133Z self.run() 2023-01-11T22:51:00.6931331Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6931471Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6931802Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6931930Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6932283Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6932401Z getattr(self, test_name)() 2023-01-11T22:51:00.6932737Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6932835Z fn() 2023-01-11T22:51:00.6933195Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6933313Z test(self, **param_kwargs) 2023-01-11T22:51:00.6933665Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6933783Z return func(*args, **kwargs) 2023-01-11T22:51:00.6934031Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.6934127Z self.run_subtests( 2023-01-11T22:51:00.6934471Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6934629Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6934985Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6935186Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6935560Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6935676Z output = model(*input) 2023-01-11T22:51:00.6935997Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6936130Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6936484Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6936889Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6937263Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6937384Z _lazy_init(state, module) 2023-01-11T22:51:00.6937737Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6937898Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6938397Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6938551Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6938875Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6938997Z return func(*args, **kwargs) 2023-01-11T22:51:00.6939368Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6939471Z p_assert( 2023-01-11T22:51:00.6939806Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6939933Z traceback.print_stack() 2023-01-11T22:51:00.6940160Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6940393Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6940506Z File "", line 1, in 2023-01-11T22:51:00.6940710Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6940852Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6941049Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6941195Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6941405Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6941506Z self.run() 2023-01-11T22:51:00.6941689Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6941831Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6942169Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6942301Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6942658Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6942778Z getattr(self, test_name)() 2023-01-11T22:51:00.6943130Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6943222Z fn() 2023-01-11T22:51:00.6943567Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6943679Z test(self, **param_kwargs) 2023-01-11T22:51:00.6944022Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6944145Z return func(*args, **kwargs) 2023-01-11T22:51:00.6944469Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.6944578Z self.run_subtests( 2023-01-11T22:51:00.6944934Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6945092Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6945435Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6945583Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6945953Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6946066Z output = model(*input) 2023-01-11T22:51:00.6946380Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6946522Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6946885Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6947096Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6947452Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6947571Z _lazy_init(state, module) 2023-01-11T22:51:00.6947914Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6948073Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6948457Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6948593Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6948919Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6949045Z return func(*args, **kwargs) 2023-01-11T22:51:00.6949404Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6949505Z p_assert( 2023-01-11T22:51:00.6949833Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6949951Z traceback.print_stack() 2023-01-11T22:51:00.6950076Z File "", line 1, in 2023-01-11T22:51:00.6950284Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6950422Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6950615Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6950748Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6950954Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6951056Z self.run() 2023-01-11T22:51:00.6951250Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6951395Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6951725Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6951853Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6952191Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6952309Z getattr(self, test_name)() 2023-01-11T22:51:00.6952660Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6952753Z fn() 2023-01-11T22:51:00.6953110Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6953285Z test(self, **param_kwargs) 2023-01-11T22:51:00.6953636Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6953760Z return func(*args, **kwargs) 2023-01-11T22:51:00.6953995Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.6954103Z self.run_subtests( 2023-01-11T22:51:00.6954449Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6954606Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6954964Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6955113Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6955480Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6955598Z output = model(*input) 2023-01-11T22:51:00.6955949Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6956087Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6956462Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6956632Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6956990Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6957107Z _lazy_init(state, module) 2023-01-11T22:51:00.6957454Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6957612Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6958006Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6958130Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6958467Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6958587Z return func(*args, **kwargs) 2023-01-11T22:51:00.6958958Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6959056Z p_assert( 2023-01-11T22:51:00.6959382Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6959504Z traceback.print_stack() 2023-01-11T22:51:00.6959734Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6959950Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6960076Z File "", line 1, in 2023-01-11T22:51:00.6960279Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6960420Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6960617Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6960763Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6960964Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6961050Z self.run() 2023-01-11T22:51:00.6961248Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6961388Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6961724Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6961854Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6962266Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6962382Z getattr(self, test_name)() 2023-01-11T22:51:00.6962740Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6962821Z fn() 2023-01-11T22:51:00.6963177Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6963296Z test(self, **param_kwargs) 2023-01-11T22:51:00.6963645Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6963766Z return func(*args, **kwargs) 2023-01-11T22:51:00.6964015Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.6964126Z self.run_subtests( 2023-01-11T22:51:00.6964473Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6964620Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6965023Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6965178Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6965553Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6965670Z output = model(*input) 2023-01-11T22:51:00.6965992Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6966125Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6966490Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6966644Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6967009Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6967127Z _lazy_init(state, module) 2023-01-11T22:51:00.6967472Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6967632Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6968022Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6968158Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6968491Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6968598Z return func(*args, **kwargs) 2023-01-11T22:51:00.6968968Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6969072Z p_assert( 2023-01-11T22:51:00.6969399Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6969522Z traceback.print_stack() 2023-01-11T22:51:00.6969648Z File "", line 1, in 2023-01-11T22:51:00.6969851Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6969990Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6970172Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6970318Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6970524Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6970620Z self.run() 2023-01-11T22:51:00.6970818Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6971017Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6971351Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6971470Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6971824Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6971941Z getattr(self, test_name)() 2023-01-11T22:51:00.6972295Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6972390Z fn() 2023-01-11T22:51:00.6972748Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6972870Z test(self, **param_kwargs) 2023-01-11T22:51:00.6973221Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6973332Z return func(*args, **kwargs) 2023-01-11T22:51:00.6973584Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.6973696Z self.run_subtests( 2023-01-11T22:51:00.6974101Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6974270Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6974627Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6974776Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6975141Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6975242Z output = model(*input) 2023-01-11T22:51:00.6975557Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6975692Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6976058Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6976229Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6976805Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6976930Z _lazy_init(state, module) 2023-01-11T22:51:00.6977285Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6977436Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6977828Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6977966Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6978303Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6978424Z return func(*args, **kwargs) 2023-01-11T22:51:00.6978800Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6978901Z p_assert( 2023-01-11T22:51:00.6979229Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6979337Z traceback.print_stack() 2023-01-11T22:51:00.6979569Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6979794Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6979921Z File "", line 1, in 2023-01-11T22:51:00.6980125Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6980260Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6980543Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6980687Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6980883Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6980981Z self.run() 2023-01-11T22:51:00.6981177Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6981321Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6981663Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6981793Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6982146Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6982264Z getattr(self, test_name)() 2023-01-11T22:51:00.6982602Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6982696Z fn() 2023-01-11T22:51:00.6983121Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6983251Z test(self, **param_kwargs) 2023-01-11T22:51:00.6983607Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6983730Z return func(*args, **kwargs) 2023-01-11T22:51:00.6983982Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.6984090Z self.run_subtests( 2023-01-11T22:51:00.6984424Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6984582Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6984939Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6985092Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6985465Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6985580Z output = model(*input) 2023-01-11T22:51:00.6985902Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6986037Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6986390Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6986559Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6986920Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6987033Z _lazy_init(state, module) 2023-01-11T22:51:00.6987384Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6987550Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6987944Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6988083Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6988403Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6988519Z return func(*args, **kwargs) 2023-01-11T22:51:00.6988889Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6988986Z p_assert( 2023-01-11T22:51:00.6989316Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6989492Z traceback.print_stack() 2023-01-11T22:51:00.6989621Z File "", line 1, in 2023-01-11T22:51:00.6989825Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.6989955Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.6990149Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.6990293Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.6990499Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.6990595Z self.run() 2023-01-11T22:51:00.6990791Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.6990932Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.6991257Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.6991386Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.6991747Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.6991865Z getattr(self, test_name)() 2023-01-11T22:51:00.6992317Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.6992419Z fn() 2023-01-11T22:51:00.6992781Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.6992898Z test(self, **param_kwargs) 2023-01-11T22:51:00.6993234Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.6993353Z return func(*args, **kwargs) 2023-01-11T22:51:00.6993596Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.6993703Z self.run_subtests( 2023-01-11T22:51:00.6994054Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.6994214Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.6994574Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.6994723Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.6995076Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.6995191Z output = model(*input) 2023-01-11T22:51:00.6995511Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.6995642Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.6996010Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.6996181Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.6996544Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.6996662Z _lazy_init(state, module) 2023-01-11T22:51:00.6996994Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.6997155Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.6997545Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.6997681Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.6998013Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.6998135Z return func(*args, **kwargs) 2023-01-11T22:51:00.6998507Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.6998658Z p_assert( 2023-01-11T22:51:00.6998980Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.6999107Z traceback.print_stack() 2023-01-11T22:51:00.6999340Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6999569Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.6999796Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7000021Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7000243Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7000467Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7000679Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7000905Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7001167Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7001392Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7001608Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7001828Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7002572Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7003318Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7004051Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7004777Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7005500Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7006220Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7006939Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7007712Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7008433Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7009147Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7009898Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7010626Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7011347Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7012075Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7012790Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7013506Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7014229Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7014941Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7015653Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7016428Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7016889Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7017112Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7017334Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7017560Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7017780Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7018000Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7018231Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7018524Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7018761Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7018983Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7019191Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7019409Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7019628Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7019848Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7020070Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7020290Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7020511Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7020726Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7021455Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7022177Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7022895Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7023622Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7024343Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7025145Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7025862Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7026582Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7027349Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7028076Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7028795Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7029524Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7030243Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7030958Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7031678Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7032396Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7033109Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7033881Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7034594Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7035309Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7036106Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7036836Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7037549Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7038271Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7038985Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7039695Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7040417Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7041130Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7041842Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7042612Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7043327Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7044044Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7044801Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7045526Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7046239Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7046961Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7047674Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7048391Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7049112Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7049824Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7050053Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7050279Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7050504Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7050779Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7051003Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7051212Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7051436Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7051663Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7051883Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7052103Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7052323Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7052542Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7052760Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7053009Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7053233Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7053450Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7053665Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7053888Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7053999Z dist init r=0, world=2 2023-01-11T22:51:00.7054324Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7054637Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7054945Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7055245Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7055528Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7055820Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7056116Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7056418Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7056940Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7057243Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7057538Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7057839Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7058032Z dist init r=1, world=2 2023-01-11T22:51:00.7058333Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7058631Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7058914Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7059208Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7059501Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7059796Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7060147Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7060449Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7060742Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7061037Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7061331Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7061631Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7061728Z ok (6.415s) 2023-01-11T22:51:00.7062040Z test_nested_always_wrap_model_offload_true_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 94692 2023-01-11T22:51:00.7062248Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 94693 2023-01-11T22:51:00.7062627Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.7062801Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.7063181Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.7063376Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.7063742Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.7063912Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.7064285Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.7064457Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.7064694Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.7064934Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.7065327Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.7065770Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.7066000Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.7066225Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.7066454Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7066682Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7067690Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.7067790Z warnings.warn( 2023-01-11T22:51:00.7068834Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.7068950Z warnings.warn( 2023-01-11T22:51:00.7069078Z File "", line 1, in 2023-01-11T22:51:00.7069286Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7069425Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7069623Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7069773Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7069983Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7070072Z self.run() 2023-01-11T22:51:00.7070272Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7070415Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7070757Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7070888Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7071244Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7071366Z getattr(self, test_name)() 2023-01-11T22:51:00.7071721Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7071805Z fn() 2023-01-11T22:51:00.7072165Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7072286Z test(self, **param_kwargs) 2023-01-11T22:51:00.7072639Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7072762Z return func(*args, **kwargs) 2023-01-11T22:51:00.7073013Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7073126Z self.run_subtests( 2023-01-11T22:51:00.7073470Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7073614Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7073972Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7074202Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7074581Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7074698Z output = model(*input) 2023-01-11T22:51:00.7075024Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7075161Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7075536Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7075692Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7076056Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7076172Z _lazy_init(state, module) 2023-01-11T22:51:00.7076519Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7076689Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7077125Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7077271Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7077611Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7077719Z return func(*args, **kwargs) 2023-01-11T22:51:00.7078086Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7078183Z p_assert( 2023-01-11T22:51:00.7078512Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7078640Z traceback.print_stack() 2023-01-11T22:51:00.7078770Z File "", line 1, in 2023-01-11T22:51:00.7078981Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7079120Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7079305Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7079450Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7079655Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7079756Z self.run() 2023-01-11T22:51:00.7079952Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7080096Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7080431Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7080596Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7080961Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7081086Z getattr(self, test_name)() 2023-01-11T22:51:00.7081446Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7081543Z fn() 2023-01-11T22:51:00.7081900Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7082018Z test(self, **param_kwargs) 2023-01-11T22:51:00.7082368Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7082476Z return func(*args, **kwargs) 2023-01-11T22:51:00.7082726Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7082837Z self.run_subtests( 2023-01-11T22:51:00.7083183Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7083405Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7083772Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7083922Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7084292Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7084395Z output = model(*input) 2023-01-11T22:51:00.7084711Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7084848Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7085216Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7085387Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7085753Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7085871Z _lazy_init(state, module) 2023-01-11T22:51:00.7086262Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7086419Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7086813Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7086955Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7087290Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7087414Z return func(*args, **kwargs) 2023-01-11T22:51:00.7087785Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7087893Z p_assert( 2023-01-11T22:51:00.7088223Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7088331Z traceback.print_stack() 2023-01-11T22:51:00.7088566Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7088794Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7088920Z File "", line 1, in 2023-01-11T22:51:00.7089126Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7089263Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7089457Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7089604Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7089798Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7089899Z self.run() 2023-01-11T22:51:00.7090095Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7090234Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7090574Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7090701Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7091054Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7091173Z getattr(self, test_name)() 2023-01-11T22:51:00.7091511Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7091606Z fn() 2023-01-11T22:51:00.7092010Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7092134Z test(self, **param_kwargs) 2023-01-11T22:51:00.7092547Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7092666Z return func(*args, **kwargs) 2023-01-11T22:51:00.7092921Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7093017Z self.run_subtests( 2023-01-11T22:51:00.7093362Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7093518Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7093874Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7094019Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7094383Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7094504Z output = model(*input) 2023-01-11T22:51:00.7094823Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7095016Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7095383Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7095552Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7095913Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7096031Z _lazy_init(state, module) 2023-01-11T22:51:00.7096375Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7096690Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7097108Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7097250Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7097574Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7097695Z return func(*args, **kwargs) 2023-01-11T22:51:00.7098066Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7098165Z p_assert( 2023-01-11T22:51:00.7098494Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7098617Z traceback.print_stack() 2023-01-11T22:51:00.7098739Z File "", line 1, in 2023-01-11T22:51:00.7098930Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7099066Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7099264Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7099416Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7099621Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7099721Z self.run() 2023-01-11T22:51:00.7099916Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7100059Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7100379Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7100506Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7100860Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7100975Z getattr(self, test_name)() 2023-01-11T22:51:00.7101325Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7101505Z fn() 2023-01-11T22:51:00.7101871Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7101994Z test(self, **param_kwargs) 2023-01-11T22:51:00.7102333Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7102452Z return func(*args, **kwargs) 2023-01-11T22:51:00.7102701Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7102810Z self.run_subtests( 2023-01-11T22:51:00.7103155Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7103316Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7103675Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7103826Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7104237Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7104358Z output = model(*input) 2023-01-11T22:51:00.7104681Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7104815Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7105181Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7105350Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7105708Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7105824Z _lazy_init(state, module) 2023-01-11T22:51:00.7106156Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7106324Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7106715Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7106853Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7107181Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7107306Z return func(*args, **kwargs) 2023-01-11T22:51:00.7107677Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7107775Z p_assert( 2023-01-11T22:51:00.7108090Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7108209Z traceback.print_stack() 2023-01-11T22:51:00.7108443Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7108675Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7108800Z File "", line 1, in 2023-01-11T22:51:00.7109005Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7109141Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7109337Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7109469Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7109671Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7109769Z self.run() 2023-01-11T22:51:00.7109962Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7110101Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7110508Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7110639Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7110983Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7111101Z getattr(self, test_name)() 2023-01-11T22:51:00.7111455Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7111547Z fn() 2023-01-11T22:51:00.7111901Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7112024Z test(self, **param_kwargs) 2023-01-11T22:51:00.7112374Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7112495Z return func(*args, **kwargs) 2023-01-11T22:51:00.7112730Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7112842Z self.run_subtests( 2023-01-11T22:51:00.7113235Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7113395Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7113759Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7113909Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7114276Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7114391Z output = model(*input) 2023-01-11T22:51:00.7114696Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7114829Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7115201Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7115375Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7115740Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7115857Z _lazy_init(state, module) 2023-01-11T22:51:00.7116201Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7116359Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7116749Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7116872Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7117200Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7117321Z return func(*args, **kwargs) 2023-01-11T22:51:00.7117694Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7117789Z p_assert( 2023-01-11T22:51:00.7118118Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7118234Z traceback.print_stack() 2023-01-11T22:51:00.7118345Z File "", line 1, in 2023-01-11T22:51:00.7118546Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7118684Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7118879Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7119024Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7119233Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7119393Z self.run() 2023-01-11T22:51:00.7119592Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7119724Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7120066Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7120194Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7120548Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7120668Z getattr(self, test_name)() 2023-01-11T22:51:00.7121017Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7121113Z fn() 2023-01-11T22:51:00.7121465Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7121572Z test(self, **param_kwargs) 2023-01-11T22:51:00.7121923Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7122041Z return func(*args, **kwargs) 2023-01-11T22:51:00.7122335Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7122457Z self.run_subtests( 2023-01-11T22:51:00.7122808Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7122966Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7123323Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7123457Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7123826Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7123946Z output = model(*input) 2023-01-11T22:51:00.7124266Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7124403Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7124770Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7124940Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7125298Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7125401Z _lazy_init(state, module) 2023-01-11T22:51:00.7125746Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7125911Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7126303Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7126439Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7126769Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7126890Z return func(*args, **kwargs) 2023-01-11T22:51:00.7127258Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7127344Z p_assert( 2023-01-11T22:51:00.7127672Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7127793Z traceback.print_stack() 2023-01-11T22:51:00.7128024Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7128256Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7128433Z File "", line 1, in 2023-01-11T22:51:00.7128633Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7128773Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7128959Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7129107Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7129313Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7129411Z self.run() 2023-01-11T22:51:00.7129602Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7129743Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7130078Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7130194Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7130550Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7130674Z getattr(self, test_name)() 2023-01-11T22:51:00.7131068Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7131167Z fn() 2023-01-11T22:51:00.7131523Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7131645Z test(self, **param_kwargs) 2023-01-11T22:51:00.7131998Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7132104Z return func(*args, **kwargs) 2023-01-11T22:51:00.7132354Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7132461Z self.run_subtests( 2023-01-11T22:51:00.7132805Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7132962Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7133322Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7133468Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7133834Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7133936Z output = model(*input) 2023-01-11T22:51:00.7134253Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7134388Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7134755Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7134924Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7135289Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7135406Z _lazy_init(state, module) 2023-01-11T22:51:00.7135757Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7135906Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7136299Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7136436Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7136963Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7137089Z return func(*args, **kwargs) 2023-01-11T22:51:00.7137467Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7137647Z p_assert( 2023-01-11T22:51:00.7137985Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7138097Z traceback.print_stack() 2023-01-11T22:51:00.7138224Z File "", line 1, in 2023-01-11T22:51:00.7138426Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7138563Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7138761Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7138906Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7139110Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7139209Z self.run() 2023-01-11T22:51:00.7139393Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7139533Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7139868Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7139998Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7140413Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7140540Z getattr(self, test_name)() 2023-01-11T22:51:00.7140898Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7140993Z fn() 2023-01-11T22:51:00.7141338Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7141457Z test(self, **param_kwargs) 2023-01-11T22:51:00.7141812Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7141935Z return func(*args, **kwargs) 2023-01-11T22:51:00.7142188Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7142298Z self.run_subtests( 2023-01-11T22:51:00.7142650Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7142809Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7143152Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7143300Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7143668Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7143784Z output = model(*input) 2023-01-11T22:51:00.7144102Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7144240Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7144606Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7144778Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7145125Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7145239Z _lazy_init(state, module) 2023-01-11T22:51:00.7145581Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7145746Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7146137Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7146276Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7146607Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7146783Z return func(*args, **kwargs) 2023-01-11T22:51:00.7147147Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7147250Z p_assert( 2023-01-11T22:51:00.7147582Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7147704Z traceback.print_stack() 2023-01-11T22:51:00.7147938Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7148168Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7148293Z File "", line 1, in 2023-01-11T22:51:00.7148495Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7148619Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7148817Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7148962Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7149210Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7149317Z self.run() 2023-01-11T22:51:00.7149516Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7149661Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7149985Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7150117Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7150469Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7150586Z getattr(self, test_name)() 2023-01-11T22:51:00.7150940Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7151035Z fn() 2023-01-11T22:51:00.7151389Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7151508Z test(self, **param_kwargs) 2023-01-11T22:51:00.7151843Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7151959Z return func(*args, **kwargs) 2023-01-11T22:51:00.7152206Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7152311Z self.run_subtests( 2023-01-11T22:51:00.7152657Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7152814Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7153169Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7153321Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7153678Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7153794Z output = model(*input) 2023-01-11T22:51:00.7154111Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7154245Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7154612Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7154778Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7155135Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7155246Z _lazy_init(state, module) 2023-01-11T22:51:00.7155649Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7155812Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7156202Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7156339Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7156672Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7156794Z return func(*args, **kwargs) 2023-01-11T22:51:00.7157172Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7157274Z p_assert( 2023-01-11T22:51:00.7157588Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7157708Z traceback.print_stack() 2023-01-11T22:51:00.7157835Z File "", line 1, in 2023-01-11T22:51:00.7158036Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7158230Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7158434Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7158582Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7158792Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7158878Z self.run() 2023-01-11T22:51:00.7159076Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7159216Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7159547Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7159672Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7160028Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7160147Z getattr(self, test_name)() 2023-01-11T22:51:00.7160499Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7160580Z fn() 2023-01-11T22:51:00.7160936Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7161056Z test(self, **param_kwargs) 2023-01-11T22:51:00.7161405Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7161527Z return func(*args, **kwargs) 2023-01-11T22:51:00.7161778Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7161889Z self.run_subtests( 2023-01-11T22:51:00.7162217Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7162376Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7162731Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7162879Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7163246Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7163363Z output = model(*input) 2023-01-11T22:51:00.7163681Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7163815Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7164179Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7164335Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7164756Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7164874Z _lazy_init(state, module) 2023-01-11T22:51:00.7165225Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7165388Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7165775Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7165911Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7166238Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7166346Z return func(*args, **kwargs) 2023-01-11T22:51:00.7166715Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7166814Z p_assert( 2023-01-11T22:51:00.7167142Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7167311Z traceback.print_stack() 2023-01-11T22:51:00.7167550Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7167778Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7167905Z File "", line 1, in 2023-01-11T22:51:00.7168095Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7168233Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7168431Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7168576Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7168778Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7168880Z self.run() 2023-01-11T22:51:00.7169077Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7169207Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7169544Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7169670Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7170023Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7170143Z getattr(self, test_name)() 2023-01-11T22:51:00.7170495Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7170592Z fn() 2023-01-11T22:51:00.7170947Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7171055Z test(self, **param_kwargs) 2023-01-11T22:51:00.7171402Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7171524Z return func(*args, **kwargs) 2023-01-11T22:51:00.7171768Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7171873Z self.run_subtests( 2023-01-11T22:51:00.7172219Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7172376Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7172727Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7172881Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7173251Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7173425Z output = model(*input) 2023-01-11T22:51:00.7173734Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7173875Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7174244Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7174412Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7174773Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7174890Z _lazy_init(state, module) 2023-01-11T22:51:00.7175234Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7175398Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7175778Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7175916Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7176296Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7176422Z return func(*args, **kwargs) 2023-01-11T22:51:00.7176981Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7177078Z p_assert( 2023-01-11T22:51:00.7177409Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7177528Z traceback.print_stack() 2023-01-11T22:51:00.7177639Z File "", line 1, in 2023-01-11T22:51:00.7177837Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7177971Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7178167Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7178311Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7178523Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7178620Z self.run() 2023-01-11T22:51:00.7178804Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7178943Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7179278Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7179409Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7179761Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7179879Z getattr(self, test_name)() 2023-01-11T22:51:00.7180226Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7180327Z fn() 2023-01-11T22:51:00.7180671Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7180793Z test(self, **param_kwargs) 2023-01-11T22:51:00.7181139Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7181261Z return func(*args, **kwargs) 2023-01-11T22:51:00.7181513Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7181626Z self.run_subtests( 2023-01-11T22:51:00.7181966Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7182118Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7182459Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7182690Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7183067Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7183184Z output = model(*input) 2023-01-11T22:51:00.7183502Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7183634Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7184005Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7184177Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7184521Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7184640Z _lazy_init(state, module) 2023-01-11T22:51:00.7184992Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7185218Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7185625Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7185764Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7186097Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7186219Z return func(*args, **kwargs) 2023-01-11T22:51:00.7186585Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7186671Z p_assert( 2023-01-11T22:51:00.7186998Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7187122Z traceback.print_stack() 2023-01-11T22:51:00.7187351Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7187583Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7187706Z File "", line 1, in 2023-01-11T22:51:00.7187907Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7188032Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7188228Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7188373Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7188579Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7188674Z self.run() 2023-01-11T22:51:00.7188867Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7189006Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7189344Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7189459Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7189817Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7189937Z getattr(self, test_name)() 2023-01-11T22:51:00.7190289Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7190382Z fn() 2023-01-11T22:51:00.7190741Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7190861Z test(self, **param_kwargs) 2023-01-11T22:51:00.7191207Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7191315Z return func(*args, **kwargs) 2023-01-11T22:51:00.7191616Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7191726Z self.run_subtests( 2023-01-11T22:51:00.7192126Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7192288Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7192650Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7192796Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7193160Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7193262Z output = model(*input) 2023-01-11T22:51:00.7193574Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7193710Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7194078Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7194292Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7194662Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7194777Z _lazy_init(state, module) 2023-01-11T22:51:00.7195119Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7195269Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7195656Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7195793Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7196134Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7196258Z return func(*args, **kwargs) 2023-01-11T22:51:00.7196634Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7196735Z p_assert( 2023-01-11T22:51:00.7197066Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7197174Z traceback.print_stack() 2023-01-11T22:51:00.7197297Z File "", line 1, in 2023-01-11T22:51:00.7197503Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7197642Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7197836Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7197980Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7198191Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7198277Z self.run() 2023-01-11T22:51:00.7198477Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7198623Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7198957Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7199085Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7199436Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7199555Z getattr(self, test_name)() 2023-01-11T22:51:00.7199906Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7199985Z fn() 2023-01-11T22:51:00.7200338Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7200514Z test(self, **param_kwargs) 2023-01-11T22:51:00.7200865Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7200988Z return func(*args, **kwargs) 2023-01-11T22:51:00.7201238Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7201346Z self.run_subtests( 2023-01-11T22:51:00.7201692Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7201835Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7202189Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7202335Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7202705Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7202820Z output = model(*input) 2023-01-11T22:51:00.7203182Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7203323Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7203693Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7203846Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7204208Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7204323Z _lazy_init(state, module) 2023-01-11T22:51:00.7204669Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7204836Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7205230Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7205372Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7205705Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7205814Z return func(*args, **kwargs) 2023-01-11T22:51:00.7206180Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7206279Z p_assert( 2023-01-11T22:51:00.7206609Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7206733Z traceback.print_stack() 2023-01-11T22:51:00.7206964Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7207196Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7207326Z File "", line 1, in 2023-01-11T22:51:00.7207517Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7207658Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7207856Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7208004Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7208209Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7208308Z self.run() 2023-01-11T22:51:00.7208504Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7208644Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7208966Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7209093Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7209503Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7209624Z getattr(self, test_name)() 2023-01-11T22:51:00.7209980Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7210072Z fn() 2023-01-11T22:51:00.7210431Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7210549Z test(self, **param_kwargs) 2023-01-11T22:51:00.7210884Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7211007Z return func(*args, **kwargs) 2023-01-11T22:51:00.7211260Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7211370Z self.run_subtests( 2023-01-11T22:51:00.7211720Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7211877Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7212304Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7212457Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7212815Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7212929Z output = model(*input) 2023-01-11T22:51:00.7213252Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7213385Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7213753Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7213922Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7214283Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7214405Z _lazy_init(state, module) 2023-01-11T22:51:00.7214737Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7214897Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7215288Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7215424Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7215757Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7215876Z return func(*args, **kwargs) 2023-01-11T22:51:00.7216249Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7216351Z p_assert( 2023-01-11T22:51:00.7216852Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7216986Z traceback.print_stack() 2023-01-11T22:51:00.7217111Z File "", line 1, in 2023-01-11T22:51:00.7217311Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7217447Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7217641Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7217790Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7217984Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7218084Z self.run() 2023-01-11T22:51:00.7218279Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7218503Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7218846Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7218982Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7219334Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7219452Z getattr(self, test_name)() 2023-01-11T22:51:00.7219790Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7219887Z fn() 2023-01-11T22:51:00.7220239Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7220358Z test(self, **param_kwargs) 2023-01-11T22:51:00.7220703Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7220825Z return func(*args, **kwargs) 2023-01-11T22:51:00.7221075Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7221255Z self.run_subtests( 2023-01-11T22:51:00.7221604Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7221759Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7222114Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7222264Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7222630Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7222746Z output = model(*input) 2023-01-11T22:51:00.7223066Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7223201Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7223556Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7223725Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7224082Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7224196Z _lazy_init(state, module) 2023-01-11T22:51:00.7224536Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7224701Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7225088Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7225225Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7225545Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7225664Z return func(*args, **kwargs) 2023-01-11T22:51:00.7226038Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7226137Z p_assert( 2023-01-11T22:51:00.7226465Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7226587Z traceback.print_stack() 2023-01-11T22:51:00.7226815Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7227041Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7227152Z File "", line 1, in 2023-01-11T22:51:00.7227356Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7227549Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7227673Z File "", line 1, in 2023-01-11T22:51:00.7227870Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7228017Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7228220Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7228317Z self.run() 2023-01-11T22:51:00.7228505Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7228640Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7228839Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7228979Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7229171Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7229316Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7229656Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7229771Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7230025Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7230130Z self.run() 2023-01-11T22:51:00.7230487Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7230608Z getattr(self, test_name)() 2023-01-11T22:51:00.7230799Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7230940Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7231291Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7231372Z fn() 2023-01-11T22:51:00.7231702Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7231834Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7232196Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7232315Z test(self, **param_kwargs) 2023-01-11T22:51:00.7232665Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7232782Z getattr(self, test_name)() 2023-01-11T22:51:00.7233135Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7233243Z return func(*args, **kwargs) 2023-01-11T22:51:00.7233592Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7233686Z fn() 2023-01-11T22:51:00.7233935Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7234046Z self.run_subtests( 2023-01-11T22:51:00.7234402Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7234523Z test(self, **param_kwargs) 2023-01-11T22:51:00.7234864Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7235008Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7235354Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7235476Z return func(*args, **kwargs) 2023-01-11T22:51:00.7235825Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7235973Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7236223Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7236386Z self.run_subtests( 2023-01-11T22:51:00.7236761Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7236863Z output = model(*input) 2023-01-11T22:51:00.7237213Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7237366Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7237685Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7237815Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7238171Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7238317Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7238689Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7238845Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7239256Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7239381Z output = model(*input) 2023-01-11T22:51:00.7239743Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7239859Z _lazy_init(state, module) 2023-01-11T22:51:00.7240179Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7240314Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7240658Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7240811Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7241178Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7241347Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7241734Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7241869Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7242227Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7242347Z _lazy_init(state, module) 2023-01-11T22:51:00.7242680Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7242786Z return func(*args, **kwargs) 2023-01-11T22:51:00.7243129Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7243293Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7243668Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7243768Z p_assert( 2023-01-11T22:51:00.7244159Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7244299Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7244632Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7244740Z traceback.print_stack() 2023-01-11T22:51:00.7245069Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7245190Z return func(*args, **kwargs) 2023-01-11T22:51:00.7245559Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7245708Z p_assert( 2023-01-11T22:51:00.7246044Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7246165Z traceback.print_stack() 2023-01-11T22:51:00.7246398Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7246614Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7246738Z File "", line 1, in 2023-01-11T22:51:00.7246939Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7247076Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7247270Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7247416Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7247624Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7247724Z self.run() 2023-01-11T22:51:00.7247949Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7248096Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7248432Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7248559Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7248910Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7249031Z getattr(self, test_name)() 2023-01-11T22:51:00.7249382Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7249463Z fn() 2023-01-11T22:51:00.7249820Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7249940Z test(self, **param_kwargs) 2023-01-11T22:51:00.7250290Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7250410Z return func(*args, **kwargs) 2023-01-11T22:51:00.7250656Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7250764Z self.run_subtests( 2023-01-11T22:51:00.7251109Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7251253Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7251607Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7251757Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7252123Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7252239Z output = model(*input) 2023-01-11T22:51:00.7252563Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7252693Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7253059Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7253214Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7253570Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7253682Z _lazy_init(state, module) 2023-01-11T22:51:00.7254021Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7254182Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7254627Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7254768Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7255104Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7255224Z return func(*args, **kwargs) 2023-01-11T22:51:00.7255581Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7255677Z p_assert( 2023-01-11T22:51:00.7256013Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7256133Z traceback.print_stack() 2023-01-11T22:51:00.7256257Z File "", line 1, in 2023-01-11T22:51:00.7256459Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7256807Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7257001Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7257219Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7257434Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7257533Z self.run() 2023-01-11T22:51:00.7257727Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7257869Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7258209Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7258338Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7258675Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7258796Z getattr(self, test_name)() 2023-01-11T22:51:00.7259156Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7259247Z fn() 2023-01-11T22:51:00.7259605Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7259725Z test(self, **param_kwargs) 2023-01-11T22:51:00.7260076Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7260192Z return func(*args, **kwargs) 2023-01-11T22:51:00.7260429Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7260538Z self.run_subtests( 2023-01-11T22:51:00.7260877Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7261033Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7261394Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7261541Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7261913Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7262028Z output = model(*input) 2023-01-11T22:51:00.7262332Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7262466Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7262830Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7262996Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7263351Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7263543Z _lazy_init(state, module) 2023-01-11T22:51:00.7263893Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7264059Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7264436Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7264572Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7264904Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7265025Z return func(*args, **kwargs) 2023-01-11T22:51:00.7265389Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7265484Z p_assert( 2023-01-11T22:51:00.7265817Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7265942Z traceback.print_stack() 2023-01-11T22:51:00.7266160Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7266436Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7266568Z File "", line 1, in 2023-01-11T22:51:00.7266773Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7266912Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7267107Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7267247Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7267451Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7267537Z self.run() 2023-01-11T22:51:00.7267731Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7267877Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7268215Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7268345Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7268697Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7268812Z getattr(self, test_name)() 2023-01-11T22:51:00.7269151Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7269248Z fn() 2023-01-11T22:51:00.7269605Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7269723Z test(self, **param_kwargs) 2023-01-11T22:51:00.7270072Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7270194Z return func(*args, **kwargs) 2023-01-11T22:51:00.7270445Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7270557Z self.run_subtests( 2023-01-11T22:51:00.7270892Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7271047Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7271403Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7271552Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7271916Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7272032Z output = model(*input) 2023-01-11T22:51:00.7272350Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7272539Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7272650Z File "", line 1, in 2023-01-11T22:51:00.7273024Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7273190Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7273551Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7273667Z _lazy_init(state, module) 2023-01-11T22:51:00.7273867Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7274001Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7274345Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7274495Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7274693Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7274840Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7275273Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7275414Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7275623Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7275725Z self.run() 2023-01-11T22:51:00.7276060Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7276168Z return func(*args, **kwargs) 2023-01-11T22:51:00.7276361Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7276499Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7276873Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7276969Z p_assert( 2023-01-11T22:51:00.7277300Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7277432Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7277756Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7277864Z traceback.print_stack() 2023-01-11T22:51:00.7278216Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7278333Z getattr(self, test_name)() 2023-01-11T22:51:00.7278685Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7278777Z fn() 2023-01-11T22:51:00.7279133Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7279256Z test(self, **param_kwargs) 2023-01-11T22:51:00.7279608Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7279715Z return func(*args, **kwargs) 2023-01-11T22:51:00.7279962Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7280074Z self.run_subtests( 2023-01-11T22:51:00.7280419Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7280577Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7280937Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7281086Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7281510Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7281612Z output = model(*input) 2023-01-11T22:51:00.7281932Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7282065Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7282434Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7282603Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7282962Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7283077Z _lazy_init(state, module) 2023-01-11T22:51:00.7283418Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7283572Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7283963Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7284156Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7284496Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7284620Z return func(*args, **kwargs) 2023-01-11T22:51:00.7284990Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7285089Z p_assert( 2023-01-11T22:51:00.7285420Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7285528Z traceback.print_stack() 2023-01-11T22:51:00.7285761Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7285994Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7286118Z File "", line 1, in 2023-01-11T22:51:00.7286322Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7286458Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7286650Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7286795Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7286989Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7287088Z self.run() 2023-01-11T22:51:00.7287286Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7287428Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7287765Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7287899Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7288254Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7288362Z getattr(self, test_name)() 2023-01-11T22:51:00.7288714Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7288810Z fn() 2023-01-11T22:51:00.7289160Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7289277Z test(self, **param_kwargs) 2023-01-11T22:51:00.7289624Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7289742Z return func(*args, **kwargs) 2023-01-11T22:51:00.7289989Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7290141Z self.run_subtests( 2023-01-11T22:51:00.7290486Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7290645Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7290996Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7291138Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7291506Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7291622Z output = model(*input) 2023-01-11T22:51:00.7291943Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7292116Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7292490Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7292663Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7293068Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7293187Z _lazy_init(state, module) 2023-01-11T22:51:00.7293534Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7293697Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7294085Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7294210Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7294539Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7294659Z return func(*args, **kwargs) 2023-01-11T22:51:00.7295034Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7295131Z p_assert( 2023-01-11T22:51:00.7295467Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7295588Z traceback.print_stack() 2023-01-11T22:51:00.7295711Z File "", line 1, in 2023-01-11T22:51:00.7295901Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7296041Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7296232Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7296377Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7296802Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7296914Z self.run() 2023-01-11T22:51:00.7297111Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7297255Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7297592Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7297721Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7298076Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7298195Z getattr(self, test_name)() 2023-01-11T22:51:00.7298541Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7298635Z fn() 2023-01-11T22:51:00.7298993Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7299099Z test(self, **param_kwargs) 2023-01-11T22:51:00.7299446Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7299653Z return func(*args, **kwargs) 2023-01-11T22:51:00.7299903Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7300014Z self.run_subtests( 2023-01-11T22:51:00.7300365Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7300519Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7300873Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7301022Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7301376Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7301490Z output = model(*input) 2023-01-11T22:51:00.7301808Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7301945Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7302372Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7302551Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7302910Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7303029Z _lazy_init(state, module) 2023-01-11T22:51:00.7303359Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7303523Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7303912Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7304053Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7304387Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7304510Z return func(*args, **kwargs) 2023-01-11T22:51:00.7304880Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7304978Z p_assert( 2023-01-11T22:51:00.7305294Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7305415Z traceback.print_stack() 2023-01-11T22:51:00.7305646Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7305875Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7306102Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7306334Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7306560Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7306787Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7306995Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7307213Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7307430Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7307653Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7307871Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7308093Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7308370Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7308594Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7308801Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7309023Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7309240Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7309459Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7309678Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7309906Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7310407Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7310869Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7311380Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7311855Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7312313Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7312794Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7313268Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7313741Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7314191Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7314657Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7315125Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7315592Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7316043Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7316514Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7316979Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7317434Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7317898Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7318362Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7318835Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7319281Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7319757Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7320223Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7320672Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7321134Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7321601Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7322061Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7330320Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7330921Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7331282Z dist init r=1, world=2 2023-01-11T22:51:00.7331751Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7332391Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7333016Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7333646Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7334271Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7334960Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7335598Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7336228Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7337237Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7337855Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7338484Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7339079Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7339499Z dist init r=0, world=2 2023-01-11T22:51:00.7339944Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7340551Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7341144Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7341752Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7342370Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7342979Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7343591Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7344182Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7344909Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7345527Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7346128Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7346714Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7347125Z ok (6.715s) 2023-01-11T22:51:00.7347590Z test_nested_always_wrap_model_offload_true_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 94775 2023-01-11T22:51:00.7348139Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 94776 2023-01-11T22:51:00.7348765Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.7349286Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.7349865Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.7350307Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.7350878Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.7351311Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.7351860Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.7352296Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.7352752Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.7353246Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.7353896Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.7354559Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.7355074Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.7355535Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.7355993Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7356469Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7357738Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.7358505Z warnings.warn( 2023-01-11T22:51:00.7359643Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.7360453Z warnings.warn( 2023-01-11T22:51:00.7360700Z File "", line 1, in 2023-01-11T22:51:00.7361068Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7361432Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7361783Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7362142Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7362513Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7362822Z self.run() 2023-01-11T22:51:00.7363136Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7363491Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7363984Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7364364Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7364878Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7365308Z getattr(self, test_name)() 2023-01-11T22:51:00.7365806Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7366156Z fn() 2023-01-11T22:51:00.7366634Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7367000Z test(self, **param_kwargs) 2023-01-11T22:51:00.7367498Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7367873Z return func(*args, **kwargs) 2023-01-11T22:51:00.7368271Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7368626Z self.run_subtests( 2023-01-11T22:51:00.7369110Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7369530Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7370053Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7370468Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7371012Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7371398Z output = model(*input) 2023-01-11T22:51:00.7371845Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7372216Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7372746Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7373173Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7373717Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7374091Z _lazy_init(state, module) 2023-01-11T22:51:00.7374587Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7374991Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7375566Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7375986Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7376465Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7377088Z return func(*args, **kwargs) 2023-01-11T22:51:00.7377709Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7378077Z p_assert( 2023-01-11T22:51:00.7378526Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7378894Z traceback.print_stack() 2023-01-11T22:51:00.7379166Z File "", line 1, in 2023-01-11T22:51:00.7379512Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7379868Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7380224Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7380624Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7381018Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7381341Z self.run() 2023-01-11T22:51:00.7381663Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7382006Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7382591Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7382979Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7383484Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7383865Z getattr(self, test_name)() 2023-01-11T22:51:00.7384365Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7384702Z fn() 2023-01-11T22:51:00.7385177Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7385552Z test(self, **param_kwargs) 2023-01-11T22:51:00.7386049Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7386413Z return func(*args, **kwargs) 2023-01-11T22:51:00.7386812Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7387180Z self.run_subtests( 2023-01-11T22:51:00.7387653Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7388057Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7388592Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7389005Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7389529Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7389912Z output = model(*input) 2023-01-11T22:51:00.7390375Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7390735Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7391266Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7391701Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7392292Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7392662Z _lazy_init(state, module) 2023-01-11T22:51:00.7393159Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7393577Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7394133Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7394653Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7395156Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7395534Z return func(*args, **kwargs) 2023-01-11T22:51:00.7396040Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7396405Z p_assert( 2023-01-11T22:51:00.7396862Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7397215Z traceback.print_stack() 2023-01-11T22:51:00.7397594Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7398060Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7398425Z File "", line 1, in 2023-01-11T22:51:00.7398770Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7399133Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7399488Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7399883Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7400270Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7400590Z self.run() 2023-01-11T22:51:00.7400901Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7401253Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7401760Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7402138Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7402633Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7403011Z getattr(self, test_name)() 2023-01-11T22:51:00.7403515Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7403852Z fn() 2023-01-11T22:51:00.7404332Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7404714Z test(self, **param_kwargs) 2023-01-11T22:51:00.7405199Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7405571Z return func(*args, **kwargs) 2023-01-11T22:51:00.7405968Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7406329Z self.run_subtests( 2023-01-11T22:51:00.7406804Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7407219Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7407755Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7408148Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7408686Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7409066Z output = model(*input) 2023-01-11T22:51:00.7409527Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7409887Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7410418Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7410856Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7411388Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7411829Z _lazy_init(state, module) 2023-01-11T22:51:00.7412324Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7412748Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7413306Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7413722Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7414214Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7414567Z return func(*args, **kwargs) 2023-01-11T22:51:00.7415079Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7415440Z p_assert( 2023-01-11T22:51:00.7415894Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7416251Z traceback.print_stack() 2023-01-11T22:51:00.7416521Z File "", line 1, in 2023-01-11T22:51:00.7417152Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7417507Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7417865Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7418221Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7418584Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7418899Z self.run() 2023-01-11T22:51:00.7419223Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7419570Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7420065Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7420438Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7420950Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7421319Z getattr(self, test_name)() 2023-01-11T22:51:00.7421816Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7422166Z fn() 2023-01-11T22:51:00.7422626Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7423009Z test(self, **param_kwargs) 2023-01-11T22:51:00.7423505Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7423878Z return func(*args, **kwargs) 2023-01-11T22:51:00.7424257Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7424622Z self.run_subtests( 2023-01-11T22:51:00.7425100Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7425494Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7426021Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7426425Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7426957Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7427329Z output = model(*input) 2023-01-11T22:51:00.7427791Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7428163Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7428681Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7429201Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7429760Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7430134Z _lazy_init(state, module) 2023-01-11T22:51:00.7430610Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7431020Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7431591Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7431993Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7432486Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7432849Z return func(*args, **kwargs) 2023-01-11T22:51:00.7433369Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7433721Z p_assert( 2023-01-11T22:51:00.7434224Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7434592Z traceback.print_stack() 2023-01-11T22:51:00.7434961Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7435433Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7435803Z File "", line 1, in 2023-01-11T22:51:00.7436160Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7436506Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7436860Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7437223Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7437583Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7437904Z self.run() 2023-01-11T22:51:00.7438224Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7438562Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7439068Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7439435Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7439950Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7440311Z getattr(self, test_name)() 2023-01-11T22:51:00.7440816Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7441167Z fn() 2023-01-11T22:51:00.7441624Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7442005Z test(self, **param_kwargs) 2023-01-11T22:51:00.7442510Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7442871Z return func(*args, **kwargs) 2023-01-11T22:51:00.7443264Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7443625Z self.run_subtests( 2023-01-11T22:51:00.7444116Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7444511Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7445049Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7445453Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7446041Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7446425Z output = model(*input) 2023-01-11T22:51:00.7446894Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7447271Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7447784Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7448219Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7448767Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7449131Z _lazy_init(state, module) 2023-01-11T22:51:00.7449622Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7450044Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7450663Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7451078Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7451583Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7451945Z return func(*args, **kwargs) 2023-01-11T22:51:00.7452454Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7452827Z p_assert( 2023-01-11T22:51:00.7453280Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7453644Z traceback.print_stack() 2023-01-11T22:51:00.7453907Z File "", line 1, in 2023-01-11T22:51:00.7454266Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7454630Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7454978Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7455330Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7455703Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7456007Z self.run() 2023-01-11T22:51:00.7456322Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7456886Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7457392Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7457746Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7458250Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7458636Z getattr(self, test_name)() 2023-01-11T22:51:00.7459127Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7459480Z fn() 2023-01-11T22:51:00.7459957Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7460336Z test(self, **param_kwargs) 2023-01-11T22:51:00.7460820Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7461191Z return func(*args, **kwargs) 2023-01-11T22:51:00.7461589Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7461938Z self.run_subtests( 2023-01-11T22:51:00.7462426Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7462928Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7463455Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7463866Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7464402Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7464775Z output = model(*input) 2023-01-11T22:51:00.7465222Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7465588Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7466115Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7466539Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7467078Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7467455Z _lazy_init(state, module) 2023-01-11T22:51:00.7468010Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7468422Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7468997Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7469416Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7469897Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7470257Z return func(*args, **kwargs) 2023-01-11T22:51:00.7470773Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7471134Z p_assert( 2023-01-11T22:51:00.7471579Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7471939Z traceback.print_stack() 2023-01-11T22:51:00.7472320Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7472774Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7473142Z File "", line 1, in 2023-01-11T22:51:00.7473501Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7473850Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7474188Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7474542Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7474916Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7475225Z self.run() 2023-01-11T22:51:00.7475549Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7475904Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7476398Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7476772Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7477284Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7477651Z getattr(self, test_name)() 2023-01-11T22:51:00.7478136Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7478489Z fn() 2023-01-11T22:51:00.7478962Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7479331Z test(self, **param_kwargs) 2023-01-11T22:51:00.7479832Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7480275Z return func(*args, **kwargs) 2023-01-11T22:51:00.7480663Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7481029Z self.run_subtests( 2023-01-11T22:51:00.7481519Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7481935Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7482455Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7482862Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7483405Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7483782Z output = model(*input) 2023-01-11T22:51:00.7484237Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7484606Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7485180Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7485609Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7486155Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7486528Z _lazy_init(state, module) 2023-01-11T22:51:00.7487002Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7487421Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7487997Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7488420Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7488905Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7489276Z return func(*args, **kwargs) 2023-01-11T22:51:00.7489800Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7490159Z p_assert( 2023-01-11T22:51:00.7490616Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7490982Z traceback.print_stack() 2023-01-11T22:51:00.7491259Z File "", line 1, in 2023-01-11T22:51:00.7491602Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7491959Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7492371Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7492722Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7493097Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7493411Z self.run() 2023-01-11T22:51:00.7493721Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7494066Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7494559Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7494929Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7495421Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7495800Z getattr(self, test_name)() 2023-01-11T22:51:00.7496296Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7496811Z fn() 2023-01-11T22:51:00.7497379Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7497753Z test(self, **param_kwargs) 2023-01-11T22:51:00.7498253Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7498613Z return func(*args, **kwargs) 2023-01-11T22:51:00.7499009Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7499370Z self.run_subtests( 2023-01-11T22:51:00.7499843Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7500254Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7500779Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7501180Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7501710Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7502168Z output = model(*input) 2023-01-11T22:51:00.7502648Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7503006Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7503538Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7503970Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7504506Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7504887Z _lazy_init(state, module) 2023-01-11T22:51:00.7505374Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7505795Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7506355Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7506765Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7507266Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7507637Z return func(*args, **kwargs) 2023-01-11T22:51:00.7508141Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7508502Z p_assert( 2023-01-11T22:51:00.7508951Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7509299Z traceback.print_stack() 2023-01-11T22:51:00.7509674Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7510157Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7510513Z File "", line 1, in 2023-01-11T22:51:00.7510879Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7511232Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7511582Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7511924Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7512295Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7512617Z self.run() 2023-01-11T22:51:00.7512925Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7513285Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7513789Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7514212Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7514728Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7515102Z getattr(self, test_name)() 2023-01-11T22:51:00.7515599Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7515937Z fn() 2023-01-11T22:51:00.7516408Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7516781Z test(self, **param_kwargs) 2023-01-11T22:51:00.7517260Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7517638Z return func(*args, **kwargs) 2023-01-11T22:51:00.7518031Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7518399Z self.run_subtests( 2023-01-11T22:51:00.7518870Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7519327Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7519867Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7520262Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7520801Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7521183Z output = model(*input) 2023-01-11T22:51:00.7521647Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7522003Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7522533Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7522973Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7523513Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7523892Z _lazy_init(state, module) 2023-01-11T22:51:00.7524381Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7524793Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7525347Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7525755Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7526254Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7526609Z return func(*args, **kwargs) 2023-01-11T22:51:00.7527127Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7527499Z p_assert( 2023-01-11T22:51:00.7527943Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7528311Z traceback.print_stack() 2023-01-11T22:51:00.7528582Z File "", line 1, in 2023-01-11T22:51:00.7528935Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7529278Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7529637Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7529991Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7530354Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7530665Z self.run() 2023-01-11T22:51:00.7531043Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7531382Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7531883Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7532259Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7532769Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7533134Z getattr(self, test_name)() 2023-01-11T22:51:00.7533634Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7533986Z fn() 2023-01-11T22:51:00.7534448Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7534820Z test(self, **param_kwargs) 2023-01-11T22:51:00.7535322Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7535702Z return func(*args, **kwargs) 2023-01-11T22:51:00.7536130Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7536502Z self.run_subtests( 2023-01-11T22:51:00.7537237Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7537634Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7538169Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7538570Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7539112Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7539478Z output = model(*input) 2023-01-11T22:51:00.7539947Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7540315Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7540695Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7540854Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7541214Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7541333Z _lazy_init(state, module) 2023-01-11T22:51:00.7541678Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7541843Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7542236Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7542374Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7542708Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7542816Z return func(*args, **kwargs) 2023-01-11T22:51:00.7543188Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7543285Z p_assert( 2023-01-11T22:51:00.7543618Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7543743Z traceback.print_stack() 2023-01-11T22:51:00.7543977Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7544205Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7544328Z File "", line 1, in 2023-01-11T22:51:00.7544607Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7544743Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7544938Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7545084Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7545292Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7545395Z self.run() 2023-01-11T22:51:00.7545588Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7545716Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7546052Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7546179Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7546536Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7546656Z getattr(self, test_name)() 2023-01-11T22:51:00.7547004Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7547157Z fn() 2023-01-11T22:51:00.7547528Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7547635Z test(self, **param_kwargs) 2023-01-11T22:51:00.7547985Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7548105Z return func(*args, **kwargs) 2023-01-11T22:51:00.7548356Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7548471Z self.run_subtests( 2023-01-11T22:51:00.7548817Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7548978Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7549336Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7549475Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7549844Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7549959Z output = model(*input) 2023-01-11T22:51:00.7550282Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7550416Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7550786Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7550955Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7551315Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7551423Z _lazy_init(state, module) 2023-01-11T22:51:00.7551772Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7551935Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7552330Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7552470Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7552800Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7552919Z return func(*args, **kwargs) 2023-01-11T22:51:00.7553292Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7553389Z p_assert( 2023-01-11T22:51:00.7553763Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7553886Z traceback.print_stack() 2023-01-11T22:51:00.7554013Z File "", line 1, in 2023-01-11T22:51:00.7554219Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7554359Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7554557Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7554703Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7554896Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7554990Z self.run() 2023-01-11T22:51:00.7555181Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7555323Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7555658Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7555789Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7556183Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7556306Z getattr(self, test_name)() 2023-01-11T22:51:00.7556653Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7556747Z fn() 2023-01-11T22:51:00.7557104Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7557225Z test(self, **param_kwargs) 2023-01-11T22:51:00.7557576Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7557696Z return func(*args, **kwargs) 2023-01-11T22:51:00.7557945Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7558059Z self.run_subtests( 2023-01-11T22:51:00.7558399Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7558555Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7558912Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7559059Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7559427Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7559543Z output = model(*input) 2023-01-11T22:51:00.7559862Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7559993Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7560351Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7560521Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7560883Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7560998Z _lazy_init(state, module) 2023-01-11T22:51:00.7561343Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7561503Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7561896Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7562031Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7562350Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7562530Z return func(*args, **kwargs) 2023-01-11T22:51:00.7562900Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7562999Z p_assert( 2023-01-11T22:51:00.7563325Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7563441Z traceback.print_stack() 2023-01-11T22:51:00.7563670Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7563901Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7564014Z File "", line 1, in 2023-01-11T22:51:00.7564214Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7564352Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7564547Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7564701Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7564902Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7565048Z self.run() 2023-01-11T22:51:00.7565238Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7565379Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7565713Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7565842Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7566198Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7566317Z getattr(self, test_name)() 2023-01-11T22:51:00.7566665Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7566764Z fn() 2023-01-11T22:51:00.7567105Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7567227Z test(self, **param_kwargs) 2023-01-11T22:51:00.7567579Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7567698Z return func(*args, **kwargs) 2023-01-11T22:51:00.7567943Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7568052Z self.run_subtests( 2023-01-11T22:51:00.7568401Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7568563Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7568908Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7569061Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7569429Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7569549Z output = model(*input) 2023-01-11T22:51:00.7569871Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7570009Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7570381Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7570551Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7570895Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7571017Z _lazy_init(state, module) 2023-01-11T22:51:00.7571364Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7571578Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7571969Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7572108Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7572439Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7572556Z return func(*args, **kwargs) 2023-01-11T22:51:00.7572915Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7573011Z p_assert( 2023-01-11T22:51:00.7573334Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7573452Z traceback.print_stack() 2023-01-11T22:51:00.7573574Z File "", line 1, in 2023-01-11T22:51:00.7573776Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7573907Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7574146Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7574285Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7574491Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7574587Z self.run() 2023-01-11T22:51:00.7574781Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7574921Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7575252Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7575384Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7575735Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7575846Z getattr(self, test_name)() 2023-01-11T22:51:00.7576203Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7576296Z fn() 2023-01-11T22:51:00.7576834Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7576960Z test(self, **param_kwargs) 2023-01-11T22:51:00.7577317Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7577437Z return func(*args, **kwargs) 2023-01-11T22:51:00.7577673Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7577785Z self.run_subtests( 2023-01-11T22:51:00.7578130Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7578291Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7578650Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7578801Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7579167Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7579284Z output = model(*input) 2023-01-11T22:51:00.7579588Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7579723Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7580091Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7580262Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7580708Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7580826Z _lazy_init(state, module) 2023-01-11T22:51:00.7581175Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7581340Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7581733Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7581857Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7582190Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7582310Z return func(*args, **kwargs) 2023-01-11T22:51:00.7582676Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7582779Z p_assert( 2023-01-11T22:51:00.7583107Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7583231Z traceback.print_stack() 2023-01-11T22:51:00.7583536Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7583762Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7583889Z File "", line 1, in 2023-01-11T22:51:00.7584095Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7584235Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7584435Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7584582Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7584790Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7584880Z self.run() 2023-01-11T22:51:00.7585074Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7585217Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7585558Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7585686Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7586041Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7586159Z getattr(self, test_name)() 2023-01-11T22:51:00.7586510Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7586590Z fn() 2023-01-11T22:51:00.7586947Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7587066Z test(self, **param_kwargs) 2023-01-11T22:51:00.7587424Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7587546Z return func(*args, **kwargs) 2023-01-11T22:51:00.7587796Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7587905Z self.run_subtests( 2023-01-11T22:51:00.7588252Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7588395Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7588750Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7588894Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7589263Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7589433Z output = model(*input) 2023-01-11T22:51:00.7589751Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7589883Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7590256Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7590412Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7590769Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7590884Z _lazy_init(state, module) 2023-01-11T22:51:00.7591230Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7591390Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7591781Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7591920Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7592352Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7592469Z return func(*args, **kwargs) 2023-01-11T22:51:00.7592840Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7592943Z p_assert( 2023-01-11T22:51:00.7593274Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7593399Z traceback.print_stack() 2023-01-11T22:51:00.7593522Z File "", line 1, in 2023-01-11T22:51:00.7593726Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7593864Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7594047Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7594197Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7594406Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7594504Z self.run() 2023-01-11T22:51:00.7594699Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7594841Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7595177Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7595292Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7595647Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7595762Z getattr(self, test_name)() 2023-01-11T22:51:00.7596114Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7596210Z fn() 2023-01-11T22:51:00.7596567Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7596691Z test(self, **param_kwargs) 2023-01-11T22:51:00.7597040Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7597147Z return func(*args, **kwargs) 2023-01-11T22:51:00.7597393Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7597499Z self.run_subtests( 2023-01-11T22:51:00.7597845Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7597995Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7598347Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7598548Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7598920Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7599023Z output = model(*input) 2023-01-11T22:51:00.7599339Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7599473Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7599846Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7600017Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7600370Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7600488Z _lazy_init(state, module) 2023-01-11T22:51:00.7600831Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7600999Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7601454Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7601601Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7601936Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7602057Z return func(*args, **kwargs) 2023-01-11T22:51:00.7602428Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7602524Z p_assert( 2023-01-11T22:51:00.7602852Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7602972Z traceback.print_stack() 2023-01-11T22:51:00.7603194Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7603424Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7603554Z File "", line 1, in 2023-01-11T22:51:00.7603756Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7603894Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7604091Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7604240Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7604431Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7604529Z self.run() 2023-01-11T22:51:00.7604723Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7604871Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7605211Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7605337Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7605693Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7605810Z getattr(self, test_name)() 2023-01-11T22:51:00.7606147Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7606236Z fn() 2023-01-11T22:51:00.7606589Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7606708Z test(self, **param_kwargs) 2023-01-11T22:51:00.7607062Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7607186Z return func(*args, **kwargs) 2023-01-11T22:51:00.7607434Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7607596Z self.run_subtests( 2023-01-11T22:51:00.7607935Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7608092Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7608452Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7608597Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7608965Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7609080Z output = model(*input) 2023-01-11T22:51:00.7609398Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7609531Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7609886Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7610099Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7610468Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7610582Z _lazy_init(state, module) 2023-01-11T22:51:00.7610927Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7611090Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7611482Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7611617Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7611935Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7612060Z return func(*args, **kwargs) 2023-01-11T22:51:00.7612432Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7612531Z p_assert( 2023-01-11T22:51:00.7612863Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7612984Z traceback.print_stack() 2023-01-11T22:51:00.7613109Z File "", line 1, in 2023-01-11T22:51:00.7613310Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7613433Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7613634Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7613776Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7613982Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7614087Z self.run() 2023-01-11T22:51:00.7614276Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7614415Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7614736Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7614861Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7615209Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7615331Z getattr(self, test_name)() 2023-01-11T22:51:00.7615683Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7615777Z fn() 2023-01-11T22:51:00.7616135Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7616254Z test(self, **param_kwargs) 2023-01-11T22:51:00.7616892Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7617019Z return func(*args, **kwargs) 2023-01-11T22:51:00.7617271Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7617378Z self.run_subtests( 2023-01-11T22:51:00.7617735Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7617888Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7618242Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7618388Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7618740Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7618860Z output = model(*input) 2023-01-11T22:51:00.7619176Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7619379Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7619764Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7619933Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7620290Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7620404Z _lazy_init(state, module) 2023-01-11T22:51:00.7620734Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7620899Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7621289Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7621433Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7621771Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7621896Z return func(*args, **kwargs) 2023-01-11T22:51:00.7622266Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7622366Z p_assert( 2023-01-11T22:51:00.7622681Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7622802Z traceback.print_stack() 2023-01-11T22:51:00.7623029Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7623259Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7623390Z File "", line 1, in 2023-01-11T22:51:00.7623592Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7623729Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7623929Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7624061Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7624268Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7624366Z self.run() 2023-01-11T22:51:00.7624561Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7624698Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7625034Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7625161Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7625509Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7625687Z getattr(self, test_name)() 2023-01-11T22:51:00.7626048Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7626141Z fn() 2023-01-11T22:51:00.7626500Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7626619Z test(self, **param_kwargs) 2023-01-11T22:51:00.7626968Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7627090Z return func(*args, **kwargs) 2023-01-11T22:51:00.7627336Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7627433Z self.run_subtests( 2023-01-11T22:51:00.7627776Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7627938Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7628338Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7628494Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7628861Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7628980Z output = model(*input) 2023-01-11T22:51:00.7629296Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7629417Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7629790Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7629957Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7630318Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7630439Z _lazy_init(state, module) 2023-01-11T22:51:00.7630791Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7630954Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7631345Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7631469Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7631797Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7631919Z return func(*args, **kwargs) 2023-01-11T22:51:00.7632287Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7632389Z p_assert( 2023-01-11T22:51:00.7632713Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7632834Z traceback.print_stack() 2023-01-11T22:51:00.7632957Z File "", line 1, in 2023-01-11T22:51:00.7633147Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7633284Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7633479Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7633628Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7633832Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7633931Z self.run() 2023-01-11T22:51:00.7634128Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7634256Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7634644Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7634768Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7635125Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7635243Z getattr(self, test_name)() 2023-01-11T22:51:00.7635595Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7635688Z fn() 2023-01-11T22:51:00.7636041Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7636146Z test(self, **param_kwargs) 2023-01-11T22:51:00.7636492Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7636609Z return func(*args, **kwargs) 2023-01-11T22:51:00.7636858Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7636968Z self.run_subtests( 2023-01-11T22:51:00.7637357Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7637519Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7637873Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7638006Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7638373Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7638486Z output = model(*input) 2023-01-11T22:51:00.7638800Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7638935Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7639304Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7639477Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7639833Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7639935Z _lazy_init(state, module) 2023-01-11T22:51:00.7640282Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7640446Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7640838Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7640974Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7641303Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7641425Z return func(*args, **kwargs) 2023-01-11T22:51:00.7641796Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7641882Z p_assert( 2023-01-11T22:51:00.7642208Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7642335Z traceback.print_stack() 2023-01-11T22:51:00.7642566Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7642796Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7642919Z File "", line 1, in 2023-01-11T22:51:00.7643124Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7643259Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7643499Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7643644Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7643851Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7643946Z self.run() 2023-01-11T22:51:00.7644140Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7644285Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7644619Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7644748Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7645090Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7645208Z getattr(self, test_name)() 2023-01-11T22:51:00.7645563Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7645665Z fn() 2023-01-11T22:51:00.7646020Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7646197Z test(self, **param_kwargs) 2023-01-11T22:51:00.7646556Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7646663Z return func(*args, **kwargs) 2023-01-11T22:51:00.7646914Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7647026Z self.run_subtests( 2023-01-11T22:51:00.7647370Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7647523Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7647874Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7648025Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7648400Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7648514Z output = model(*input) 2023-01-11T22:51:00.7648818Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7648953Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7649322Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7649492Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7649852Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7649970Z _lazy_init(state, module) 2023-01-11T22:51:00.7650315Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7650480Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7650858Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7650998Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7651329Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7651449Z return func(*args, **kwargs) 2023-01-11T22:51:00.7651815Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7651915Z p_assert( 2023-01-11T22:51:00.7652242Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7652362Z traceback.print_stack() 2023-01-11T22:51:00.7652532Z File "", line 1, in 2023-01-11T22:51:00.7652729Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7652870Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7653067Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7653212Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7653417Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7653516Z self.run() 2023-01-11T22:51:00.7653698Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7653841Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7654179Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7654308Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7654657Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7654778Z getattr(self, test_name)() 2023-01-11T22:51:00.7655174Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7655273Z fn() 2023-01-11T22:51:00.7655618Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7655739Z test(self, **param_kwargs) 2023-01-11T22:51:00.7656085Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7656208Z return func(*args, **kwargs) 2023-01-11T22:51:00.7656455Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7656729Z self.run_subtests( 2023-01-11T22:51:00.7657094Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7657257Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7657608Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7657754Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7658123Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7658237Z output = model(*input) 2023-01-11T22:51:00.7658555Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7658687Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7659057Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7659221Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7659570Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7659690Z _lazy_init(state, module) 2023-01-11T22:51:00.7660037Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7660199Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7660588Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7660725Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7661053Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7661170Z return func(*args, **kwargs) 2023-01-11T22:51:00.7661523Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7661705Z p_assert( 2023-01-11T22:51:00.7662039Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7662162Z traceback.print_stack() 2023-01-11T22:51:00.7662394Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7662626Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7662755Z File "", line 1, in 2023-01-11T22:51:00.7662959Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7663083Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7663281Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7663423Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7663629Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7663734Z self.run() 2023-01-11T22:51:00.7663930Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7664128Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7664473Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7664588Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7664944Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7665065Z getattr(self, test_name)() 2023-01-11T22:51:00.7665420Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7665513Z fn() 2023-01-11T22:51:00.7665870Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7665992Z test(self, **param_kwargs) 2023-01-11T22:51:00.7666325Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7666449Z return func(*args, **kwargs) 2023-01-11T22:51:00.7666701Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7666807Z self.run_subtests( 2023-01-11T22:51:00.7667155Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7667308Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7667664Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7667812Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7668165Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7668281Z output = model(*input) 2023-01-11T22:51:00.7668600Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7668732Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7669099Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7669268Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7669628Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7669747Z _lazy_init(state, module) 2023-01-11T22:51:00.7670094Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7670244Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7670690Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7670825Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7671160Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7671278Z return func(*args, **kwargs) 2023-01-11T22:51:00.7671652Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7671754Z p_assert( 2023-01-11T22:51:00.7672081Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7672189Z traceback.print_stack() 2023-01-11T22:51:00.7672312Z File "", line 1, in 2023-01-11T22:51:00.7672512Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7672653Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7672852Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7672997Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7673247Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7673339Z self.run() 2023-01-11T22:51:00.7673536Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7673681Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7674018Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7674150Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7674503Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7674622Z getattr(self, test_name)() 2023-01-11T22:51:00.7674975Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7675059Z fn() 2023-01-11T22:51:00.7675415Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7675532Z test(self, **param_kwargs) 2023-01-11T22:51:00.7675883Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7676005Z return func(*args, **kwargs) 2023-01-11T22:51:00.7676253Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:51:00.7676362Z self.run_subtests( 2023-01-11T22:51:00.7676707Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7676851Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7677201Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7677354Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7677721Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7677839Z output = model(*input) 2023-01-11T22:51:00.7678160Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7678293Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7678661Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7678815Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7679176Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7679290Z _lazy_init(state, module) 2023-01-11T22:51:00.7679692Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7679858Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7680247Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7680382Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7680776Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7680885Z return func(*args, **kwargs) 2023-01-11T22:51:00.7681258Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7681358Z p_assert( 2023-01-11T22:51:00.7681689Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7681815Z traceback.print_stack() 2023-01-11T22:51:00.7682049Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7682326Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7682565Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7682777Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7683002Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7683221Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7683437Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7683661Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7683881Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7684103Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7684327Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7684533Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7684752Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7684969Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7685185Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7685399Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7685617Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7685839Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7686054Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7686273Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7686477Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7686697Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7686913Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7687126Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7687342Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7687559Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7687833Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7688056Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7688260Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7688480Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7688695Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7688907Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7689122Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7689340Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7689560Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7689780Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7690025Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7690248Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7690460Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7690671Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7690883Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7691098Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7691319Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7691541Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7691764Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7691971Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7692226Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7692448Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7692554Z dist init r=1, world=2 2023-01-11T22:51:00.7692879Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7693196Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7693500Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7693805Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7694105Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7694400Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7694682Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7694974Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7695327Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7695621Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7695912Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7696206Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.7696311Z dist init r=0, world=2 2023-01-11T22:51:00.7696832Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7697229Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7697539Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7697835Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7698118Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7698410Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7698709Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7699004Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7699298Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7699587Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7699878Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7700175Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.7700274Z ok (6.814s) 2023-01-11T22:51:00.7700604Z test_nested_wrapped_model_offload_false_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 94858 2023-01-11T22:51:00.7700816Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 94859 2023-01-11T22:51:00.7701183Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.7701353Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.7701728Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.7701911Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.7702273Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.7702519Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.7702899Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.7703084Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.7703310Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.7703550Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.7703942Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.7704330Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.7704553Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.7704775Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.7705058Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7705295Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7706309Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.7706423Z warnings.warn( 2023-01-11T22:51:00.7707437Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.7707550Z warnings.warn( 2023-01-11T22:51:00.7707764Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7707991Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7708212Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7708441Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7708661Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7708882Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7709106Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7709325Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7709532Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7709748Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7709964Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7710184Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7710397Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7710670Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7710887Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7711112Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7711860Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7712602Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7713379Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7714120Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7714848Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7715576Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7716284Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7717007Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7717732Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7718455Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7719170Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7719891Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7720660Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7721377Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7722130Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7722858Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7723570Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7724290Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7725011Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7725730Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7726447Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7727166Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7727876Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7728591Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7729356Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7730074Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7730824Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7731550Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7732260Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7732975Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7733692Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7734408Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7735125Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7735842Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7736785Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7737527Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7737881Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7738115Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7738339Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7738565Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7738783Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7739005Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7739213Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7739436Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7739709Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7739940Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7740162Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7740384Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7740605Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7740822Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7741027Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7741255Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7741473Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7741696Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7741914Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7742130Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7742345Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7742565Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7742780Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7742985Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7743725Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7744454Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7745172Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7745955Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7746673Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7747385Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7748137Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7748862Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7749578Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7750299Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7751011Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7751718Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7752435Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7753143Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7753855Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7754621Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7755331Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7756043Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7756799Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7757520Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7758235Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7758954Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7759666Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7760377Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7761098Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7761810Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7762523Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7763290Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7764002Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7764715Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7765481Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7766204Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7766914Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7767628Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7768346Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7769056Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7769770Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7770486Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7771202Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7771920Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7772687Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7773396Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7774148Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7774871Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7775583Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7776304Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7777245Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7777965Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7778198Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7778437Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7778663Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7778876Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7779105Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7779334Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7779446Z dist init r=0, world=2 2023-01-11T22:51:00.7779555Z dist init r=1, world=2 2023-01-11T22:51:00.7779655Z ok (5.714s) 2023-01-11T22:51:00.7779983Z test_nested_wrapped_model_offload_false_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 94941 2023-01-11T22:51:00.7780197Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 94942 2023-01-11T22:51:00.7780641Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.7780813Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.7781183Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.7781365Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.7781722Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.7781891Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.7782259Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.7782445Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.7782674Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.7782977Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.7783379Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.7783765Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.7783992Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.7784208Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.7784436Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7784663Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7785677Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.7785792Z warnings.warn( 2023-01-11T22:51:00.7786795Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.7786907Z warnings.warn( 2023-01-11T22:51:00.7787117Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7787343Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7787568Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7787793Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7788009Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7788223Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7788444Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7788661Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7788920Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7789140Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7789366Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7789585Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7789805Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7790022Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7790241Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7790456Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7791487Z /opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py:197: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:00.7791724Z Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2023-01-11T22:51:00.7792488Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7793226Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7793963Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7794692Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7795410Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7795646Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7795871Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7796095Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7796318Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7796540Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7796761Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7796984Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7797205Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7797469Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7797693Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7797910Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7798130Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7798349Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7798572Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7798794Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7799008Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7799213Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7799432Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7799692Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7799920Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7800139Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7800358Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7800576Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7800795Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7801529Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7802264Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7802986Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7803704Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7804424Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7805146Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7805860Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7806636Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7807341Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7808056Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7808821Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7809055Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7809278Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7809500Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7809725Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7809950Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7810178Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7810290Z dist init r=1, world=2 2023-01-11T22:51:00.7810394Z dist init r=0, world=2 2023-01-11T22:51:00.7810478Z ok (6.013s) 2023-01-11T22:51:00.7810819Z test_nested_wrapped_model_offload_false_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95024 2023-01-11T22:51:00.7811034Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95025 2023-01-11T22:51:00.7811407Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.7811577Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.7811947Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.7812139Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.7812503Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.7812658Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.7813031Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.7813216Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.7813455Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.7813694Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.7814081Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.7814529Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.7814754Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.7814977Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.7815190Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7815416Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7816425Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.7816752Z warnings.warn( 2023-01-11T22:51:00.7817859Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.7817979Z warnings.warn( 2023-01-11T22:51:00.7818213Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7818439Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7818667Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7818900Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7819125Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7819330Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7819553Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7819774Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7819992Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7820211Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7820432Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7820649Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7820869Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7821075Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7821293Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7821510Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7822507Z /opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py:197: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:00.7822799Z Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2023-01-11T22:51:00.7823535Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7824265Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7824993Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7825780Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7826511Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7826741Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7826966Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7827198Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7827425Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7827647Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7827852Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7828077Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7828296Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7828514Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7828730Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7828952Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7829170Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7829392Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7829595Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7829812Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7830036Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7830258Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7830474Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7830693Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7830914Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7831187Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7831393Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7831607Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7831824Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7832554Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7833274Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7834047Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7834779Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7835498Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7836222Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7836942Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7837659Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7838381Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7839096Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7839812Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7840099Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7840326Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7840549Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7840771Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7840993Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7841201Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7841309Z dist init r=0, world=2 2023-01-11T22:51:00.7841413Z dist init r=1, world=2 2023-01-11T22:51:00.7841509Z ok (6.013s) 2023-01-11T22:51:00.7841837Z test_nested_wrapped_model_offload_true_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95107 2023-01-11T22:51:00.7842090Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95108 2023-01-11T22:51:00.7842469Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.7842638Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.7842995Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.7843180Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.7843544Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.7843713Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.7844087Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.7844271Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.7844514Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.7844755Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.7845132Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.7845520Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.7845744Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.7845965Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.7846202Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7846428Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7847440Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.7847549Z warnings.warn( 2023-01-11T22:51:00.7848552Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.7848712Z warnings.warn( 2023-01-11T22:51:00.7848840Z File "", line 1, in 2023-01-11T22:51:00.7849032Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7849166Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7849364Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7849510Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7849716Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7849818Z self.run() 2023-01-11T22:51:00.7850017Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7850167Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7850495Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7850669Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7851038Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7851160Z getattr(self, test_name)() 2023-01-11T22:51:00.7851514Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7851606Z fn() 2023-01-11T22:51:00.7851962Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7852079Z test(self, **param_kwargs) 2023-01-11T22:51:00.7852417Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7852542Z return func(*args, **kwargs) 2023-01-11T22:51:00.7852791Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.7852902Z self.run_subtests( 2023-01-11T22:51:00.7853253Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7853415Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7853775Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7853920Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7854277Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7854391Z output = model(*input) 2023-01-11T22:51:00.7854711Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7854851Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7855217Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7855389Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7855753Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7855872Z _lazy_init(state, module) 2023-01-11T22:51:00.7856202Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7856368Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7856937Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7857083Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7857506Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7857629Z return func(*args, **kwargs) 2023-01-11T22:51:00.7858006Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7858104Z p_assert( 2023-01-11T22:51:00.7858422Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7858540Z traceback.print_stack() 2023-01-11T22:51:00.7858665Z File "", line 1, in 2023-01-11T22:51:00.7858869Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7859005Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7859200Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7859341Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7859537Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7859634Z self.run() 2023-01-11T22:51:00.7859888Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7860039Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7860376Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7860504Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7860859Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7860975Z getattr(self, test_name)() 2023-01-11T22:51:00.7861312Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7861408Z fn() 2023-01-11T22:51:00.7861770Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7861896Z test(self, **param_kwargs) 2023-01-11T22:51:00.7862248Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7862368Z return func(*args, **kwargs) 2023-01-11T22:51:00.7862618Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.7862728Z self.run_subtests( 2023-01-11T22:51:00.7863056Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7863217Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7863576Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7863725Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7864094Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7864218Z output = model(*input) 2023-01-11T22:51:00.7864540Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7864672Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7865025Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7865192Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7865549Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7865664Z _lazy_init(state, module) 2023-01-11T22:51:00.7866013Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7866179Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7866624Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7866765Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7867084Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7867206Z return func(*args, **kwargs) 2023-01-11T22:51:00.7867574Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7867674Z p_assert( 2023-01-11T22:51:00.7868000Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7868124Z traceback.print_stack() 2023-01-11T22:51:00.7868355Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7868583Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7868700Z File "", line 1, in 2023-01-11T22:51:00.7868905Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7869124Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7869328Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7869473Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7869679Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7869779Z self.run() 2023-01-11T22:51:00.7869980Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7870108Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7870444Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7870576Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7870939Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7871064Z getattr(self, test_name)() 2023-01-11T22:51:00.7871422Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7871516Z fn() 2023-01-11T22:51:00.7871874Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7871980Z test(self, **param_kwargs) 2023-01-11T22:51:00.7872330Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7872446Z return func(*args, **kwargs) 2023-01-11T22:51:00.7872693Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.7872806Z self.run_subtests( 2023-01-11T22:51:00.7873148Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7873303Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7873648Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7873796Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7874167Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7874286Z output = model(*input) 2023-01-11T22:51:00.7874605Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7874742Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7875108Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7875333Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7875697Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7875803Z _lazy_init(state, module) 2023-01-11T22:51:00.7876151Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7876317Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7876712Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7876852Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7877184Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7877307Z return func(*args, **kwargs) 2023-01-11T22:51:00.7877678Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7877767Z p_assert( 2023-01-11T22:51:00.7878138Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7878265Z traceback.print_stack() 2023-01-11T22:51:00.7878389Z File "", line 1, in 2023-01-11T22:51:00.7878594Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7878732Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7878927Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7879060Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7879267Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7879370Z self.run() 2023-01-11T22:51:00.7879569Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7879718Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7880054Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7880188Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7880543Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7880649Z getattr(self, test_name)() 2023-01-11T22:51:00.7881002Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7881098Z fn() 2023-01-11T22:51:00.7881455Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7881576Z test(self, **param_kwargs) 2023-01-11T22:51:00.7881924Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7882048Z return func(*args, **kwargs) 2023-01-11T22:51:00.7882293Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.7882392Z self.run_subtests( 2023-01-11T22:51:00.7882735Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7882889Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7883250Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7883393Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7883764Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7883876Z output = model(*input) 2023-01-11T22:51:00.7884194Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7884369Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7884744Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7884915Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7885273Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7885392Z _lazy_init(state, module) 2023-01-11T22:51:00.7885735Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7885899Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7886289Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7886413Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7886749Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7886870Z return func(*args, **kwargs) 2023-01-11T22:51:00.7887294Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7887403Z p_assert( 2023-01-11T22:51:00.7887736Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7887860Z traceback.print_stack() 2023-01-11T22:51:00.7888090Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7888305Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7888430Z File "", line 1, in 2023-01-11T22:51:00.7888634Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7888776Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7888970Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7889119Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7889328Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7889427Z self.run() 2023-01-11T22:51:00.7889609Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7889749Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7890086Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7890214Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7890567Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7890687Z getattr(self, test_name)() 2023-01-11T22:51:00.7891039Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7891123Z fn() 2023-01-11T22:51:00.7891485Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7891606Z test(self, **param_kwargs) 2023-01-11T22:51:00.7891954Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7892077Z return func(*args, **kwargs) 2023-01-11T22:51:00.7892376Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.7892491Z self.run_subtests( 2023-01-11T22:51:00.7892840Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7892983Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7893406Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7893552Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7893927Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7894042Z output = model(*input) 2023-01-11T22:51:00.7894359Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7894495Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7894867Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7895022Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7895375Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7895496Z _lazy_init(state, module) 2023-01-11T22:51:00.7895842Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7896054Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7896456Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7896850Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7897195Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7897313Z return func(*args, **kwargs) 2023-01-11T22:51:00.7897670Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7897767Z p_assert( 2023-01-11T22:51:00.7898097Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7898224Z traceback.print_stack() 2023-01-11T22:51:00.7898345Z File "", line 1, in 2023-01-11T22:51:00.7898551Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7898686Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7898869Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7899013Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7899217Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7899316Z self.run() 2023-01-11T22:51:00.7899512Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7899651Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7899985Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7900109Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7900451Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7900569Z getattr(self, test_name)() 2023-01-11T22:51:00.7900930Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7901024Z fn() 2023-01-11T22:51:00.7901382Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7901501Z test(self, **param_kwargs) 2023-01-11T22:51:00.7901849Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7901968Z return func(*args, **kwargs) 2023-01-11T22:51:00.7902202Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.7902310Z self.run_subtests( 2023-01-11T22:51:00.7902748Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7902908Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7903265Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7903412Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7903781Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7903899Z output = model(*input) 2023-01-11T22:51:00.7904204Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7904336Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7904708Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7904878Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7905299Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7905421Z _lazy_init(state, module) 2023-01-11T22:51:00.7905768Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7905928Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7906307Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7906446Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7906780Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7906902Z return func(*args, **kwargs) 2023-01-11T22:51:00.7907270Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7907376Z p_assert( 2023-01-11T22:51:00.7907709Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7907830Z traceback.print_stack() 2023-01-11T22:51:00.7908048Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7908277Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7909016Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7909748Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7909882Z File "", line 1, in 2023-01-11T22:51:00.7910088Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7910224Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7910423Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7910570Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7910762Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7910860Z self.run() 2023-01-11T22:51:00.7911055Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7911193Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7911586Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7911718Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7912079Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7912199Z getattr(self, test_name)() 2023-01-11T22:51:00.7912538Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7912630Z fn() 2023-01-11T22:51:00.7912985Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7913106Z test(self, **param_kwargs) 2023-01-11T22:51:00.7913454Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7913574Z return func(*args, **kwargs) 2023-01-11T22:51:00.7913826Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.7913935Z self.run_subtests( 2023-01-11T22:51:00.7914310Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7914473Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7914831Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7914980Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7915350Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7915469Z output = model(*input) 2023-01-11T22:51:00.7915790Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7915926Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7916289Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7916460Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7916825Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7916942Z _lazy_init(state, module) 2023-01-11T22:51:00.7917289Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7917452Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7917846Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7917980Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7918300Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7918426Z return func(*args, **kwargs) 2023-01-11T22:51:00.7918798Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7918896Z p_assert( 2023-01-11T22:51:00.7919224Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7919345Z traceback.print_stack() 2023-01-11T22:51:00.7919472Z File "", line 1, in 2023-01-11T22:51:00.7919679Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7919802Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7919998Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7920144Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7920350Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7920503Z self.run() 2023-01-11T22:51:00.7920699Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7920846Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7921185Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7921300Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7921651Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7921767Z getattr(self, test_name)() 2023-01-11T22:51:00.7922123Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7922218Z fn() 2023-01-11T22:51:00.7922577Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7922697Z test(self, **param_kwargs) 2023-01-11T22:51:00.7923034Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7923196Z return func(*args, **kwargs) 2023-01-11T22:51:00.7923451Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.7923557Z self.run_subtests( 2023-01-11T22:51:00.7923903Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7924059Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7924415Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7924566Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7924920Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7925041Z output = model(*input) 2023-01-11T22:51:00.7925359Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7925496Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7925870Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7926041Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7926399Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7926516Z _lazy_init(state, module) 2023-01-11T22:51:00.7926862Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7927011Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7927403Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7927545Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7927880Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7927999Z return func(*args, **kwargs) 2023-01-11T22:51:00.7928368Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7928466Z p_assert( 2023-01-11T22:51:00.7928798Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7928906Z traceback.print_stack() 2023-01-11T22:51:00.7929138Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7929366Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7929545Z File "", line 1, in 2023-01-11T22:51:00.7929752Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7929887Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7930086Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7930219Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7930425Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7930522Z self.run() 2023-01-11T22:51:00.7930717Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7930857Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7931193Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7931324Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7931677Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7931786Z getattr(self, test_name)() 2023-01-11T22:51:00.7932186Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7932285Z fn() 2023-01-11T22:51:00.7932646Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7932765Z test(self, **param_kwargs) 2023-01-11T22:51:00.7933110Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7933233Z return func(*args, **kwargs) 2023-01-11T22:51:00.7933480Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.7933575Z self.run_subtests( 2023-01-11T22:51:00.7933917Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7934079Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7934444Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7934593Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7934966Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7935087Z output = model(*input) 2023-01-11T22:51:00.7935407Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7935527Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7935893Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7936063Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7936423Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7936758Z _lazy_init(state, module) 2023-01-11T22:51:00.7937136Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7937302Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7937695Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7937818Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7938151Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7938277Z return func(*args, **kwargs) 2023-01-11T22:51:00.7938645Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7938827Z p_assert( 2023-01-11T22:51:00.7939162Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7939292Z traceback.print_stack() 2023-01-11T22:51:00.7939419Z File "", line 1, in 2023-01-11T22:51:00.7939608Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7939743Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7939940Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7940086Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7940291Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7940392Z self.run() 2023-01-11T22:51:00.7940588Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7940716Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7941055Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7941183Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7941595Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7941719Z getattr(self, test_name)() 2023-01-11T22:51:00.7942074Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7942167Z fn() 2023-01-11T22:51:00.7942522Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7942627Z test(self, **param_kwargs) 2023-01-11T22:51:00.7942976Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7943096Z return func(*args, **kwargs) 2023-01-11T22:51:00.7943343Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.7943455Z self.run_subtests( 2023-01-11T22:51:00.7943802Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7943962Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7944320Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7944455Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7944820Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7944936Z output = model(*input) 2023-01-11T22:51:00.7945257Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7945393Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7945761Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7945934Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7946297Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7946399Z _lazy_init(state, module) 2023-01-11T22:51:00.7946749Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7946914Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7947307Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7947446Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7947778Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7947956Z return func(*args, **kwargs) 2023-01-11T22:51:00.7948335Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7948434Z p_assert( 2023-01-11T22:51:00.7948751Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7948871Z traceback.print_stack() 2023-01-11T22:51:00.7949098Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7949326Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7949449Z File "", line 1, in 2023-01-11T22:51:00.7949652Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7949791Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7949976Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7950124Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7950391Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7950497Z self.run() 2023-01-11T22:51:00.7950696Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7950838Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7951173Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7951297Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7951637Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7951754Z getattr(self, test_name)() 2023-01-11T22:51:00.7952100Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7952201Z fn() 2023-01-11T22:51:00.7952556Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7952680Z test(self, **param_kwargs) 2023-01-11T22:51:00.7953029Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7953150Z return func(*args, **kwargs) 2023-01-11T22:51:00.7953381Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.7953495Z self.run_subtests( 2023-01-11T22:51:00.7953839Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7953997Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7954354Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7954508Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7954882Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7955001Z output = model(*input) 2023-01-11T22:51:00.7955309Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7955444Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7955817Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7955989Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7956348Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7956464Z _lazy_init(state, module) 2023-01-11T22:51:00.7956879Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7957043Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7957424Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7957563Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7957892Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7958011Z return func(*args, **kwargs) 2023-01-11T22:51:00.7958383Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7958482Z p_assert( 2023-01-11T22:51:00.7958812Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7958937Z traceback.print_stack() 2023-01-11T22:51:00.7959051Z File "", line 1, in 2023-01-11T22:51:00.7959255Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7959440Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7959641Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7959785Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7959991Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7960093Z self.run() 2023-01-11T22:51:00.7960277Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7960420Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7960757Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7960885Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7961237Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7961358Z getattr(self, test_name)() 2023-01-11T22:51:00.7961717Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7961812Z fn() 2023-01-11T22:51:00.7962157Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7962274Z test(self, **param_kwargs) 2023-01-11T22:51:00.7962622Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7962739Z return func(*args, **kwargs) 2023-01-11T22:51:00.7962984Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.7963094Z self.run_subtests( 2023-01-11T22:51:00.7963438Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7963600Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7963946Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7964097Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7964462Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7964580Z output = model(*input) 2023-01-11T22:51:00.7964901Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7965040Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7965410Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7965578Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7965981Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7966103Z _lazy_init(state, module) 2023-01-11T22:51:00.7966453Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7966619Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7967007Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7967143Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7967475Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7967594Z return func(*args, **kwargs) 2023-01-11T22:51:00.7967951Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7968053Z p_assert( 2023-01-11T22:51:00.7968381Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7968544Z traceback.print_stack() 2023-01-11T22:51:00.7968782Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7969007Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7969132Z File "", line 1, in 2023-01-11T22:51:00.7969337Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7969462Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7969657Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7969807Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7969999Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7970102Z self.run() 2023-01-11T22:51:00.7970299Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7970444Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7970783Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7970913Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7971267Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7971390Z getattr(self, test_name)() 2023-01-11T22:51:00.7971729Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7971823Z fn() 2023-01-11T22:51:00.7972180Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7972303Z test(self, **param_kwargs) 2023-01-11T22:51:00.7972654Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7972778Z return func(*args, **kwargs) 2023-01-11T22:51:00.7973024Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.7973130Z self.run_subtests( 2023-01-11T22:51:00.7973459Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7973614Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7973970Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7974111Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7974475Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7974641Z output = model(*input) 2023-01-11T22:51:00.7974959Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7975096Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7975450Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7975617Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7975975Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7976089Z _lazy_init(state, module) 2023-01-11T22:51:00.7976432Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7976768Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7977174Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7977317Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7977705Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7977837Z return func(*args, **kwargs) 2023-01-11T22:51:00.7978214Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7978309Z p_assert( 2023-01-11T22:51:00.7978636Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7978760Z traceback.print_stack() 2023-01-11T22:51:00.7978889Z File "", line 1, in 2023-01-11T22:51:00.7979094Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7979217Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7979417Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7979562Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7979767Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7979868Z self.run() 2023-01-11T22:51:00.7980067Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7980208Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7980544Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7980708Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7981071Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7981194Z getattr(self, test_name)() 2023-01-11T22:51:00.7981550Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7981649Z fn() 2023-01-11T22:51:00.7982013Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7982134Z test(self, **param_kwargs) 2023-01-11T22:51:00.7982469Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7982588Z return func(*args, **kwargs) 2023-01-11T22:51:00.7982831Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.7982940Z self.run_subtests( 2023-01-11T22:51:00.7983282Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7983436Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7983795Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7984026Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7984388Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7984503Z output = model(*input) 2023-01-11T22:51:00.7984818Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7984952Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7985322Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7985488Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7985842Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7985958Z _lazy_init(state, module) 2023-01-11T22:51:00.7986306Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7986456Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7986888Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7987030Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7987363Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7987484Z return func(*args, **kwargs) 2023-01-11T22:51:00.7987853Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7987951Z p_assert( 2023-01-11T22:51:00.7988281Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.7988394Z traceback.print_stack() 2023-01-11T22:51:00.7988619Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7988852Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.7989594Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7990328Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.7990457Z File "", line 1, in 2023-01-11T22:51:00.7990669Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.7990805Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.7991010Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.7991143Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.7991351Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.7991451Z self.run() 2023-01-11T22:51:00.7991649Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.7991791Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.7992132Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.7992305Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.7992665Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.7992828Z getattr(self, test_name)() 2023-01-11T22:51:00.7993187Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.7993284Z fn() 2023-01-11T22:51:00.7993645Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.7993769Z test(self, **param_kwargs) 2023-01-11T22:51:00.7994120Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.7994245Z return func(*args, **kwargs) 2023-01-11T22:51:00.7994493Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.7994588Z self.run_subtests( 2023-01-11T22:51:00.7994931Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.7995094Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.7995455Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.7995655Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.7996035Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.7996153Z output = model(*input) 2023-01-11T22:51:00.7996473Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.7996593Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.7996967Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.7997139Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.7997496Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.7997618Z _lazy_init(state, module) 2023-01-11T22:51:00.7997964Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.7998126Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.7998515Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.7998639Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.7998972Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.7999091Z return func(*args, **kwargs) 2023-01-11T22:51:00.7999466Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.7999563Z p_assert( 2023-01-11T22:51:00.7999898Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8000022Z traceback.print_stack() 2023-01-11T22:51:00.8000151Z File "", line 1, in 2023-01-11T22:51:00.8000341Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8000479Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8000678Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8000822Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8001026Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8001126Z self.run() 2023-01-11T22:51:00.8001322Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8001449Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8001785Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8001970Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8002334Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8002453Z getattr(self, test_name)() 2023-01-11T22:51:00.8002807Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8002901Z fn() 2023-01-11T22:51:00.8003257Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8003363Z test(self, **param_kwargs) 2023-01-11T22:51:00.8003713Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8003834Z return func(*args, **kwargs) 2023-01-11T22:51:00.8004082Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8004200Z self.run_subtests( 2023-01-11T22:51:00.8004587Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8004753Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8005109Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8005242Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8005608Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8005725Z output = model(*input) 2023-01-11T22:51:00.8006049Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8006183Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8006554Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8006723Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8007084Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8007186Z _lazy_init(state, module) 2023-01-11T22:51:00.8007528Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8007690Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8008086Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8008225Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8008556Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8008682Z return func(*args, **kwargs) 2023-01-11T22:51:00.8009053Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8009140Z p_assert( 2023-01-11T22:51:00.8009477Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8009601Z traceback.print_stack() 2023-01-11T22:51:00.8009830Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8010061Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8010188Z File "", line 1, in 2023-01-11T22:51:00.8010393Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8010528Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8010709Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8010939Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8011149Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8011256Z self.run() 2023-01-11T22:51:00.8011451Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8011590Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8011928Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8012058Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8012398Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8012516Z getattr(self, test_name)() 2023-01-11T22:51:00.8012869Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8012964Z fn() 2023-01-11T22:51:00.8013320Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8013438Z test(self, **param_kwargs) 2023-01-11T22:51:00.8013846Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8013971Z return func(*args, **kwargs) 2023-01-11T22:51:00.8014205Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8014320Z self.run_subtests( 2023-01-11T22:51:00.8014666Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8014822Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8015181Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8015332Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8015703Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8015823Z output = model(*input) 2023-01-11T22:51:00.8016128Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8016261Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8016811Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8016989Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8017353Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8017469Z _lazy_init(state, module) 2023-01-11T22:51:00.8017814Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8017982Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8018363Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8018501Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8018831Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8018952Z return func(*args, **kwargs) 2023-01-11T22:51:00.8019322Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8019424Z p_assert( 2023-01-11T22:51:00.8019753Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8019872Z traceback.print_stack() 2023-01-11T22:51:00.8019982Z File "", line 1, in 2023-01-11T22:51:00.8020275Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8020414Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8020620Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8020769Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8020977Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8021076Z self.run() 2023-01-11T22:51:00.8021260Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8021401Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8021737Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8021864Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8022214Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8022337Z getattr(self, test_name)() 2023-01-11T22:51:00.8022748Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8022848Z fn() 2023-01-11T22:51:00.8023196Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8023312Z test(self, **param_kwargs) 2023-01-11T22:51:00.8023662Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8023782Z return func(*args, **kwargs) 2023-01-11T22:51:00.8024030Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8024139Z self.run_subtests( 2023-01-11T22:51:00.8024483Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8024643Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8024988Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8025138Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8025506Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8025620Z output = model(*input) 2023-01-11T22:51:00.8025939Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8026071Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8026437Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8026607Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8026955Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8027073Z _lazy_init(state, module) 2023-01-11T22:51:00.8027423Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8027587Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8027981Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8028119Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8028451Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8028572Z return func(*args, **kwargs) 2023-01-11T22:51:00.8028929Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8029082Z p_assert( 2023-01-11T22:51:00.8029415Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8029535Z traceback.print_stack() 2023-01-11T22:51:00.8029768Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8029997Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8030123Z File "", line 1, in 2023-01-11T22:51:00.8030326Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8030450Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8030640Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8030788Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8030997Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8031100Z self.run() 2023-01-11T22:51:00.8031296Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8031437Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8031816Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8031938Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8032291Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8032408Z getattr(self, test_name)() 2023-01-11T22:51:00.8032760Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8032852Z fn() 2023-01-11T22:51:00.8033207Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8033324Z test(self, **param_kwargs) 2023-01-11T22:51:00.8033664Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8033782Z return func(*args, **kwargs) 2023-01-11T22:51:00.8034032Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8034139Z self.run_subtests( 2023-01-11T22:51:00.8034486Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8034644Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8035001Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8035148Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8035503Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8035625Z output = model(*input) 2023-01-11T22:51:00.8035943Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8036076Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8036449Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8036619Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8036977Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8037095Z _lazy_init(state, module) 2023-01-11T22:51:00.8037441Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8037591Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8037978Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8038168Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8038506Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8038629Z return func(*args, **kwargs) 2023-01-11T22:51:00.8039001Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8039100Z p_assert( 2023-01-11T22:51:00.8039427Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8039535Z traceback.print_stack() 2023-01-11T22:51:00.8039659Z File "", line 1, in 2023-01-11T22:51:00.8039860Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8039997Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8040193Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8040343Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8040594Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8040686Z self.run() 2023-01-11T22:51:00.8040883Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8041025Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8041358Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8041488Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8041842Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8041962Z getattr(self, test_name)() 2023-01-11T22:51:00.8042314Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8042398Z fn() 2023-01-11T22:51:00.8042755Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8042880Z test(self, **param_kwargs) 2023-01-11T22:51:00.8043226Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8043341Z return func(*args, **kwargs) 2023-01-11T22:51:00.8043588Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8043698Z self.run_subtests( 2023-01-11T22:51:00.8044044Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8044186Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8044542Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8044692Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8045062Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8045175Z output = model(*input) 2023-01-11T22:51:00.8045493Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8045623Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8045992Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8046147Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8046502Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8046621Z _lazy_init(state, module) 2023-01-11T22:51:00.8046966Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8047177Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8047576Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8047715Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8048051Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8048157Z return func(*args, **kwargs) 2023-01-11T22:51:00.8048526Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8048622Z p_assert( 2023-01-11T22:51:00.8048953Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8049073Z traceback.print_stack() 2023-01-11T22:51:00.8049301Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8049533Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8049701Z File "", line 1, in 2023-01-11T22:51:00.8049899Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8050039Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8050232Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8050377Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8050583Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8050683Z self.run() 2023-01-11T22:51:00.8050877Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8051006Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8051340Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8051471Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8051822Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8051939Z getattr(self, test_name)() 2023-01-11T22:51:00.8052291Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8052386Z fn() 2023-01-11T22:51:00.8052744Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8052850Z test(self, **param_kwargs) 2023-01-11T22:51:00.8053199Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8053315Z return func(*args, **kwargs) 2023-01-11T22:51:00.8053563Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8053676Z self.run_subtests( 2023-01-11T22:51:00.8054027Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8054185Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8054539Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8054673Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8055042Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8055157Z output = model(*input) 2023-01-11T22:51:00.8055473Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8055604Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8056027Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8056195Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8056782Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8056899Z _lazy_init(state, module) 2023-01-11T22:51:00.8057257Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8057419Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8057811Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8057946Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8058280Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8058408Z return func(*args, **kwargs) 2023-01-11T22:51:00.8058773Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8058954Z p_assert( 2023-01-11T22:51:00.8059287Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8059407Z traceback.print_stack() 2023-01-11T22:51:00.8059532Z File "", line 1, in 2023-01-11T22:51:00.8059736Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8059872Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8060068Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8060215Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8060407Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8060510Z self.run() 2023-01-11T22:51:00.8060706Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8060849Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8061184Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8061314Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8061668Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8061789Z getattr(self, test_name)() 2023-01-11T22:51:00.8062129Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8062223Z fn() 2023-01-11T22:51:00.8062581Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8062704Z test(self, **param_kwargs) 2023-01-11T22:51:00.8063060Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8063181Z return func(*args, **kwargs) 2023-01-11T22:51:00.8063431Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8063540Z self.run_subtests( 2023-01-11T22:51:00.8063869Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8064027Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8064385Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8064530Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8064899Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8065085Z output = model(*input) 2023-01-11T22:51:00.8065405Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8065544Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8065900Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8066072Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8066435Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8066551Z _lazy_init(state, module) 2023-01-11T22:51:00.8066895Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8067057Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8067444Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8067586Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8067952Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8068080Z return func(*args, **kwargs) 2023-01-11T22:51:00.8068452Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8068549Z p_assert( 2023-01-11T22:51:00.8068879Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8069004Z traceback.print_stack() 2023-01-11T22:51:00.8069237Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8069466Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8070198Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8070940Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8071677Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8072411Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8073142Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8073866Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8074591Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8075379Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8075503Z File "", line 1, in 2023-01-11T22:51:00.8075714Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8075852Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8076046Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8076193Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8076405Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8076511Z self.run() 2023-01-11T22:51:00.8076751Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8076903Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8077239Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8077365Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8077722Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8077847Z getattr(self, test_name)() 2023-01-11T22:51:00.8078203Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8078283Z fn() 2023-01-11T22:51:00.8078641Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8078763Z test(self, **param_kwargs) 2023-01-11T22:51:00.8079120Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8079241Z return func(*args, **kwargs) 2023-01-11T22:51:00.8079490Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8079603Z self.run_subtests( 2023-01-11T22:51:00.8079945Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8080088Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8080444Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8080592Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8080962Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8081076Z output = model(*input) 2023-01-11T22:51:00.8081400Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8081536Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8081907Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8082073Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8082420Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8082540Z _lazy_init(state, module) 2023-01-11T22:51:00.8082885Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8083104Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8083498Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8083643Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8083973Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8084095Z return func(*args, **kwargs) 2023-01-11T22:51:00.8084456Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8084557Z p_assert( 2023-01-11T22:51:00.8084889Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8085012Z traceback.print_stack() 2023-01-11T22:51:00.8085137Z File "", line 1, in 2023-01-11T22:51:00.8085340Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8085479Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8085663Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8085855Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8086065Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8086165Z self.run() 2023-01-11T22:51:00.8086363Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8086506Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8086843Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8086977Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8087317Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8087441Z getattr(self, test_name)() 2023-01-11T22:51:00.8087793Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8087886Z fn() 2023-01-11T22:51:00.8088244Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8088363Z test(self, **param_kwargs) 2023-01-11T22:51:00.8088710Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8088829Z return func(*args, **kwargs) 2023-01-11T22:51:00.8089057Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8089165Z self.run_subtests( 2023-01-11T22:51:00.8089510Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8089670Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8090027Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8090177Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8090549Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8090665Z output = model(*input) 2023-01-11T22:51:00.8090970Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8091106Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8091475Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8091640Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8091999Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8092168Z _lazy_init(state, module) 2023-01-11T22:51:00.8092575Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8092736Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8093114Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8093251Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8093581Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8093701Z return func(*args, **kwargs) 2023-01-11T22:51:00.8094074Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8094175Z p_assert( 2023-01-11T22:51:00.8094505Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8094627Z traceback.print_stack() 2023-01-11T22:51:00.8094897Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8095134Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8095360Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8095585Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8095806Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8096028Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8096247Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8096472Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8096862Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8097095Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8097317Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8097538Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8097761Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8097978Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8098193Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8098409Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8098624Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8098831Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8099051Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8099268Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8099486Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8099702Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8099920Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8100140Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8101183Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:237: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:00.8101406Z (rank, world_num_valid_indices[rank]) 2023-01-11T22:51:00.8102419Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:237: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:00.8102553Z (rank, world_num_valid_indices[rank]) 2023-01-11T22:51:00.8102765Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8102993Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8103268Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8103496Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8103717Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8103938Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8104160Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8104379Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8104585Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8104806Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8105027Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8105250Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8105468Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8105686Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8105907Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8106122Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8106326Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8106543Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8106765Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8106987Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8107204Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8107419Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8107640Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8107857Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8108602Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8109390Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8110119Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8110851Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8111614Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8112343Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8113067Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8113797Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8114520Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8115224Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8115948Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8116666Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8117388Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8118165Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8118882Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8119598Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8120359Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8121082Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8121796Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8122516Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8123235Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8123946Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8124666Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8125381Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8126093Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8126870Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8127585Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8128302Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8129105Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8129828Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8130539Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8131257Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8131977Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8132689Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8133406Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8134115Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8134827Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8135589Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8136304Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8137331Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8138145Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8138874Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8139586Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8140303Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8141016Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8141724Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8142441Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8143148Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8143854Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8144639Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8145350Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8146060Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8146175Z dist init r=0, world=2 2023-01-11T22:51:00.8146502Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8146885Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8147192Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8147489Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8147782Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8148065Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8148363Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8148658Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8148947Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8149245Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8149540Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8149839Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8149950Z dist init r=1, world=2 2023-01-11T22:51:00.8150271Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8150580Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8150884Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8151186Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8151523Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8151818Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8152111Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8152405Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8152698Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8152997Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8153333Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8153634Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8153732Z ok (6.114s) 2023-01-11T22:51:00.8154056Z test_nested_wrapped_model_offload_true_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95190 2023-01-11T22:51:00.8154272Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95191 2023-01-11T22:51:00.8154636Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.8154813Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.8155192Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.8155379Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.8155739Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.8155908Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.8156281Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.8156468Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.8156696Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.8156937Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.8157334Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.8157727Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.8157954Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.8158177Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.8158409Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8158633Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8159641Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.8159805Z warnings.warn( 2023-01-11T22:51:00.8160813Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.8160908Z warnings.warn( 2023-01-11T22:51:00.8161032Z File "", line 1, in 2023-01-11T22:51:00.8161241Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8161380Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8161621Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8161773Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8161982Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8162083Z self.run() 2023-01-11T22:51:00.8162267Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8162403Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8162737Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8162867Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8163216Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8163340Z getattr(self, test_name)() 2023-01-11T22:51:00.8163696Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8163795Z fn() 2023-01-11T22:51:00.8164151Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8164269Z test(self, **param_kwargs) 2023-01-11T22:51:00.8164603Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8164722Z return func(*args, **kwargs) 2023-01-11T22:51:00.8164962Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8165081Z self.run_subtests( 2023-01-11T22:51:00.8165429Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8165593Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8165953Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8166107Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8166482Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8166600Z output = model(*input) 2023-01-11T22:51:00.8166904Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8167040Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8167410Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8175150Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8175610Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8175841Z _lazy_init(state, module) 2023-01-11T22:51:00.8176211Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8176377Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8177082Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8177226Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8177562Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8177683Z return func(*args, **kwargs) 2023-01-11T22:51:00.8178058Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8178158Z p_assert( 2023-01-11T22:51:00.8178495Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8178619Z traceback.print_stack() 2023-01-11T22:51:00.8178842Z File "", line 1, in 2023-01-11T22:51:00.8179046Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8179189Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8179386Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8179530Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8179738Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8179836Z self.run() 2023-01-11T22:51:00.8180030Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8180170Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8180496Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8180628Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8180987Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8181104Z getattr(self, test_name)() 2023-01-11T22:51:00.8181457Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8181551Z fn() 2023-01-11T22:51:00.8181907Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8182025Z test(self, **param_kwargs) 2023-01-11T22:51:00.8182363Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8182481Z return func(*args, **kwargs) 2023-01-11T22:51:00.8182727Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8182838Z self.run_subtests( 2023-01-11T22:51:00.8183186Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8183344Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8183702Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8183848Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8184203Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8184317Z output = model(*input) 2023-01-11T22:51:00.8184634Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8184768Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8185225Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8185393Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8185756Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8185874Z _lazy_init(state, module) 2023-01-11T22:51:00.8186207Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8186371Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8186757Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8186893Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8187221Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8187344Z return func(*args, **kwargs) 2023-01-11T22:51:00.8187712Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8187856Z p_assert( 2023-01-11T22:51:00.8188183Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8188304Z traceback.print_stack() 2023-01-11T22:51:00.8188534Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8188763Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8188887Z File "", line 1, in 2023-01-11T22:51:00.8189086Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8189223Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8189414Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8189551Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8189756Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8189857Z self.run() 2023-01-11T22:51:00.8190050Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8190189Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8190525Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8190655Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8190994Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8191113Z getattr(self, test_name)() 2023-01-11T22:51:00.8191464Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8191557Z fn() 2023-01-11T22:51:00.8191911Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8192025Z test(self, **param_kwargs) 2023-01-11T22:51:00.8192424Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8192548Z return func(*args, **kwargs) 2023-01-11T22:51:00.8192783Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8192894Z self.run_subtests( 2023-01-11T22:51:00.8193242Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8193399Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8193755Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8193962Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8194333Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8194446Z output = model(*input) 2023-01-11T22:51:00.8194752Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8194885Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8195250Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8195422Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8195781Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8195897Z _lazy_init(state, module) 2023-01-11T22:51:00.8196241Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8196403Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8196823Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8196963Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8197294Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8197414Z return func(*args, **kwargs) 2023-01-11T22:51:00.8197783Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8197882Z p_assert( 2023-01-11T22:51:00.8198211Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8198333Z traceback.print_stack() 2023-01-11T22:51:00.8198444Z File "", line 1, in 2023-01-11T22:51:00.8198653Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8198792Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8198989Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8199132Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8199340Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8199442Z self.run() 2023-01-11T22:51:00.8199641Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8199770Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8200100Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8200228Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8200585Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8200708Z getattr(self, test_name)() 2023-01-11T22:51:00.8201067Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8201164Z fn() 2023-01-11T22:51:00.8201508Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8201630Z test(self, **param_kwargs) 2023-01-11T22:51:00.8201975Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8202093Z return func(*args, **kwargs) 2023-01-11T22:51:00.8202337Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8202448Z self.run_subtests( 2023-01-11T22:51:00.8202796Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8203012Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8203364Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8203516Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8203886Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8204003Z output = model(*input) 2023-01-11T22:51:00.8204326Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8204457Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8204825Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8204992Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8205353Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8205456Z _lazy_init(state, module) 2023-01-11T22:51:00.8205861Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8206028Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8206422Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8206558Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8206886Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8207007Z return func(*args, **kwargs) 2023-01-11T22:51:00.8207374Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8207464Z p_assert( 2023-01-11T22:51:00.8207791Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8207909Z traceback.print_stack() 2023-01-11T22:51:00.8208142Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8208368Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8208493Z File "", line 1, in 2023-01-11T22:51:00.8208693Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8208816Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8209009Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8209152Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8209358Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8209461Z self.run() 2023-01-11T22:51:00.8209653Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8209792Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8210130Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8210244Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8210597Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8210714Z getattr(self, test_name)() 2023-01-11T22:51:00.8211063Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8211155Z fn() 2023-01-11T22:51:00.8211506Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8211627Z test(self, **param_kwargs) 2023-01-11T22:51:00.8212030Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8212136Z return func(*args, **kwargs) 2023-01-11T22:51:00.8212386Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8212499Z self.run_subtests( 2023-01-11T22:51:00.8212844Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8213002Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8213356Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8213502Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8213869Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8213975Z output = model(*input) 2023-01-11T22:51:00.8214293Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8214422Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8214836Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8215011Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8215371Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8215484Z _lazy_init(state, module) 2023-01-11T22:51:00.8215824Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8215973Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8216359Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8216497Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8217078Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8217201Z return func(*args, **kwargs) 2023-01-11T22:51:00.8217569Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8217669Z p_assert( 2023-01-11T22:51:00.8217999Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8218107Z traceback.print_stack() 2023-01-11T22:51:00.8218229Z File "", line 1, in 2023-01-11T22:51:00.8218430Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8218567Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8218763Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8218915Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8219123Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8219227Z self.run() 2023-01-11T22:51:00.8219410Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8219548Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8219881Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8220006Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8220361Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8220476Z getattr(self, test_name)() 2023-01-11T22:51:00.8220827Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8220993Z fn() 2023-01-11T22:51:00.8221357Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8221480Z test(self, **param_kwargs) 2023-01-11T22:51:00.8221825Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8221943Z return func(*args, **kwargs) 2023-01-11T22:51:00.8222189Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8222301Z self.run_subtests( 2023-01-11T22:51:00.8222642Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8222784Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8223137Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8223289Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8223716Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8223839Z output = model(*input) 2023-01-11T22:51:00.8224163Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8224298Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8224669Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8224825Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8225185Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8225301Z _lazy_init(state, module) 2023-01-11T22:51:00.8225647Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8225812Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8226204Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8226341Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8226670Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8226790Z return func(*args, **kwargs) 2023-01-11T22:51:00.8227146Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8227242Z p_assert( 2023-01-11T22:51:00.8227570Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8227691Z traceback.print_stack() 2023-01-11T22:51:00.8227922Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8228154Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8228900Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8229638Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8229760Z File "", line 1, in 2023-01-11T22:51:00.8229951Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8230143Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8230341Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8230487Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8230689Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8230790Z self.run() 2023-01-11T22:51:00.8230983Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8231124Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8231446Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8231573Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8231925Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8232040Z getattr(self, test_name)() 2023-01-11T22:51:00.8232396Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8232487Z fn() 2023-01-11T22:51:00.8232891Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8233016Z test(self, **param_kwargs) 2023-01-11T22:51:00.8233355Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8233477Z return func(*args, **kwargs) 2023-01-11T22:51:00.8233721Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8233833Z self.run_subtests( 2023-01-11T22:51:00.8234174Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8234332Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8234695Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8234841Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8235197Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8235311Z output = model(*input) 2023-01-11T22:51:00.8235626Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8235758Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8236125Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8236288Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8236646Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8236767Z _lazy_init(state, module) 2023-01-11T22:51:00.8237100Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8237265Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8237655Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8237792Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8238124Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8238242Z return func(*args, **kwargs) 2023-01-11T22:51:00.8238611Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8238710Z p_assert( 2023-01-11T22:51:00.8239026Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8239202Z traceback.print_stack() 2023-01-11T22:51:00.8239327Z File "", line 1, in 2023-01-11T22:51:00.8239535Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8239673Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8239871Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8240017Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8240209Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8240310Z self.run() 2023-01-11T22:51:00.8240506Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8240645Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8240981Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8241109Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8241462Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8241623Z getattr(self, test_name)() 2023-01-11T22:51:00.8241975Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8242064Z fn() 2023-01-11T22:51:00.8242420Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8242540Z test(self, **param_kwargs) 2023-01-11T22:51:00.8242886Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8243006Z return func(*args, **kwargs) 2023-01-11T22:51:00.8243252Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8243364Z self.run_subtests( 2023-01-11T22:51:00.8243697Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8243854Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8244212Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8244360Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8244721Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8244833Z output = model(*input) 2023-01-11T22:51:00.8245154Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8245287Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8245643Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8245812Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8246174Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8246290Z _lazy_init(state, module) 2023-01-11T22:51:00.8246638Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8246804Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8247195Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8247333Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8247651Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8247774Z return func(*args, **kwargs) 2023-01-11T22:51:00.8248205Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8248302Z p_assert( 2023-01-11T22:51:00.8248638Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8248757Z traceback.print_stack() 2023-01-11T22:51:00.8248988Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8249213Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8249323Z File "", line 1, in 2023-01-11T22:51:00.8249523Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8249658Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8249849Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8249997Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8250198Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8250297Z self.run() 2023-01-11T22:51:00.8250538Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8250673Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8251007Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8251134Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8251488Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8251607Z getattr(self, test_name)() 2023-01-11T22:51:00.8251960Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8252050Z fn() 2023-01-11T22:51:00.8252391Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8252513Z test(self, **param_kwargs) 2023-01-11T22:51:00.8252862Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8252981Z return func(*args, **kwargs) 2023-01-11T22:51:00.8253227Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8253336Z self.run_subtests( 2023-01-11T22:51:00.8253677Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8253830Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8254176Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8254324Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8254698Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8254813Z output = model(*input) 2023-01-11T22:51:00.8255136Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8255268Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8255638Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8255808Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8256167Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8256270Z _lazy_init(state, module) 2023-01-11T22:51:00.8256867Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8257133Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8257535Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8257673Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8258003Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8258121Z return func(*args, **kwargs) 2023-01-11T22:51:00.8258496Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8258583Z p_assert( 2023-01-11T22:51:00.8258910Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8259030Z traceback.print_stack() 2023-01-11T22:51:00.8259150Z File "", line 1, in 2023-01-11T22:51:00.8259350Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8259486Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8259677Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8259874Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8260086Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8260182Z self.run() 2023-01-11T22:51:00.8260377Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8260517Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8260855Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8260980Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8261332Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8261445Z getattr(self, test_name)() 2023-01-11T22:51:00.8261793Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8261881Z fn() 2023-01-11T22:51:00.8262238Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8262354Z test(self, **param_kwargs) 2023-01-11T22:51:00.8262699Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8262816Z return func(*args, **kwargs) 2023-01-11T22:51:00.8263059Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8263156Z self.run_subtests( 2023-01-11T22:51:00.8263497Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8263649Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8264007Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8264155Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8264518Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8264627Z output = model(*input) 2023-01-11T22:51:00.8264940Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8265060Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8265427Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8265594Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8265951Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8266123Z _lazy_init(state, module) 2023-01-11T22:51:00.8266475Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8266636Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8267022Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8267145Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8267474Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8267592Z return func(*args, **kwargs) 2023-01-11T22:51:00.8267961Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8268057Z p_assert( 2023-01-11T22:51:00.8268387Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8268513Z traceback.print_stack() 2023-01-11T22:51:00.8268800Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8269023Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8269148Z File "", line 1, in 2023-01-11T22:51:00.8269345Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8269476Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8269667Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8269809Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8270007Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8270106Z self.run() 2023-01-11T22:51:00.8270289Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8270432Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8270772Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8270898Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8271247Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8271367Z getattr(self, test_name)() 2023-01-11T22:51:00.8271713Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8271793Z fn() 2023-01-11T22:51:00.8272147Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8272270Z test(self, **param_kwargs) 2023-01-11T22:51:00.8272619Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8272741Z return func(*args, **kwargs) 2023-01-11T22:51:00.8272986Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8273096Z self.run_subtests( 2023-01-11T22:51:00.8273437Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8273580Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8273937Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8274082Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8274448Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8274563Z output = model(*input) 2023-01-11T22:51:00.8274881Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8275074Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8275452Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8275607Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8275966Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8276076Z _lazy_init(state, module) 2023-01-11T22:51:00.8276421Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8276582Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8276973Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8277112Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8277441Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8277605Z return func(*args, **kwargs) 2023-01-11T22:51:00.8277969Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8278063Z p_assert( 2023-01-11T22:51:00.8278388Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8278506Z traceback.print_stack() 2023-01-11T22:51:00.8278623Z File "", line 1, in 2023-01-11T22:51:00.8278821Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8278957Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8279138Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8279287Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8279488Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8279585Z self.run() 2023-01-11T22:51:00.8279783Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8279922Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8280256Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8280383Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8280780Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8280901Z getattr(self, test_name)() 2023-01-11T22:51:00.8281254Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8281349Z fn() 2023-01-11T22:51:00.8281707Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8281823Z test(self, **param_kwargs) 2023-01-11T22:51:00.8282173Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8282292Z return func(*args, **kwargs) 2023-01-11T22:51:00.8282524Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8282633Z self.run_subtests( 2023-01-11T22:51:00.8282976Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8283132Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8283489Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8283635Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8284060Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8284178Z output = model(*input) 2023-01-11T22:51:00.8284488Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8284624Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8284992Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8285159Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8285516Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8285633Z _lazy_init(state, module) 2023-01-11T22:51:00.8285976Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8286142Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8286563Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8286709Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8287046Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8287168Z return func(*args, **kwargs) 2023-01-11T22:51:00.8287536Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8287635Z p_assert( 2023-01-11T22:51:00.8287965Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8288087Z traceback.print_stack() 2023-01-11T22:51:00.8288305Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8288541Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8288663Z File "", line 1, in 2023-01-11T22:51:00.8288867Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8289003Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8289196Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8289339Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8289532Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8289631Z self.run() 2023-01-11T22:51:00.8289831Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8289972Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8290309Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8290442Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8290795Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8290917Z getattr(self, test_name)() 2023-01-11T22:51:00.8291255Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8291350Z fn() 2023-01-11T22:51:00.8291706Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8291826Z test(self, **param_kwargs) 2023-01-11T22:51:00.8292174Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8292295Z return func(*args, **kwargs) 2023-01-11T22:51:00.8292590Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8292794Z self.run_subtests( 2023-01-11T22:51:00.8293131Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8293293Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8293652Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8293797Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8294162Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8294279Z output = model(*input) 2023-01-11T22:51:00.8294596Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8294730Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8295084Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8295257Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8295661Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8295783Z _lazy_init(state, module) 2023-01-11T22:51:00.8296129Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8296290Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8296868Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8297013Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8297343Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8297465Z return func(*args, **kwargs) 2023-01-11T22:51:00.8297841Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8297936Z p_assert( 2023-01-11T22:51:00.8298267Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8298387Z traceback.print_stack() 2023-01-11T22:51:00.8298513Z File "", line 1, in 2023-01-11T22:51:00.8298717Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8298840Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8299034Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8299181Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8299385Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8299484Z self.run() 2023-01-11T22:51:00.8299685Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8299826Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8300161Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8300276Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8300630Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8300749Z getattr(self, test_name)() 2023-01-11T22:51:00.8301100Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8301192Z fn() 2023-01-11T22:51:00.8301550Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8301664Z test(self, **param_kwargs) 2023-01-11T22:51:00.8301995Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8302205Z return func(*args, **kwargs) 2023-01-11T22:51:00.8302452Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8302558Z self.run_subtests( 2023-01-11T22:51:00.8302904Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8303057Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8303412Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8303556Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8303924Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8304027Z output = model(*input) 2023-01-11T22:51:00.8304347Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8304477Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8304902Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8305079Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8305442Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8305560Z _lazy_init(state, module) 2023-01-11T22:51:00.8305905Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8306056Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8306445Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8306589Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8306917Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8307042Z return func(*args, **kwargs) 2023-01-11T22:51:00.8307411Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8307507Z p_assert( 2023-01-11T22:51:00.8307839Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8307948Z traceback.print_stack() 2023-01-11T22:51:00.8308176Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8308407Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8309147Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8309886Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8310013Z File "", line 1, in 2023-01-11T22:51:00.8310223Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8310363Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8310560Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8310691Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8310969Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8311069Z self.run() 2023-01-11T22:51:00.8311274Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8311418Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8311754Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8311877Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8312229Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8312335Z getattr(self, test_name)() 2023-01-11T22:51:00.8312687Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8312779Z fn() 2023-01-11T22:51:00.8313137Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8313258Z test(self, **param_kwargs) 2023-01-11T22:51:00.8313651Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8313775Z return func(*args, **kwargs) 2023-01-11T22:51:00.8314022Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8314119Z self.run_subtests( 2023-01-11T22:51:00.8314462Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8314616Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8314969Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8315115Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8315483Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8315602Z output = model(*input) 2023-01-11T22:51:00.8315920Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8316040Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8316412Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8316583Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8316938Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8317053Z _lazy_init(state, module) 2023-01-11T22:51:00.8317394Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8317551Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8317943Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8318071Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8318403Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8318520Z return func(*args, **kwargs) 2023-01-11T22:51:00.8318887Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8318982Z p_assert( 2023-01-11T22:51:00.8319307Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8319427Z traceback.print_stack() 2023-01-11T22:51:00.8319550Z File "", line 1, in 2023-01-11T22:51:00.8319739Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8319932Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8320126Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8320279Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8320479Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8320581Z self.run() 2023-01-11T22:51:00.8320775Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8320902Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8321235Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8321362Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8321713Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8321832Z getattr(self, test_name)() 2023-01-11T22:51:00.8322190Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8322282Z fn() 2023-01-11T22:51:00.8322682Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8322792Z test(self, **param_kwargs) 2023-01-11T22:51:00.8323146Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8323268Z return func(*args, **kwargs) 2023-01-11T22:51:00.8323515Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8323623Z self.run_subtests( 2023-01-11T22:51:00.8323968Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8324125Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8324485Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8324618Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8324988Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8325104Z output = model(*input) 2023-01-11T22:51:00.8325419Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8325549Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8325914Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8326083Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8326439Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8326545Z _lazy_init(state, module) 2023-01-11T22:51:00.8326892Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8327059Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8327449Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8327590Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8327929Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8328051Z return func(*args, **kwargs) 2023-01-11T22:51:00.8328419Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8328504Z p_assert( 2023-01-11T22:51:00.8328837Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8329011Z traceback.print_stack() 2023-01-11T22:51:00.8329244Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8329477Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8329604Z File "", line 1, in 2023-01-11T22:51:00.8329808Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8329948Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8330129Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8330276Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8330479Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8330581Z self.run() 2023-01-11T22:51:00.8330776Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8330922Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8331259Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8331442Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8331793Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8331913Z getattr(self, test_name)() 2023-01-11T22:51:00.8332264Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8332359Z fn() 2023-01-11T22:51:00.8332716Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8332834Z test(self, **param_kwargs) 2023-01-11T22:51:00.8333181Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8333302Z return func(*args, **kwargs) 2023-01-11T22:51:00.8333533Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8333644Z self.run_subtests( 2023-01-11T22:51:00.8333986Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8334142Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8334494Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8334642Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8335010Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8335127Z output = model(*input) 2023-01-11T22:51:00.8335432Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8335571Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8335942Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8336113Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8336472Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8336785Z _lazy_init(state, module) 2023-01-11T22:51:00.8337147Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8337311Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8337686Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8337821Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8338243Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8338367Z return func(*args, **kwargs) 2023-01-11T22:51:00.8338740Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8338838Z p_assert( 2023-01-11T22:51:00.8339169Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8339291Z traceback.print_stack() 2023-01-11T22:51:00.8339402Z File "", line 1, in 2023-01-11T22:51:00.8339605Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8339737Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8339932Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8340078Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8340286Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8340385Z self.run() 2023-01-11T22:51:00.8340627Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8340778Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8341112Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8341240Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8341596Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8341710Z getattr(self, test_name)() 2023-01-11T22:51:00.8342061Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8342151Z fn() 2023-01-11T22:51:00.8342494Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8342615Z test(self, **param_kwargs) 2023-01-11T22:51:00.8342965Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8343085Z return func(*args, **kwargs) 2023-01-11T22:51:00.8343328Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8343439Z self.run_subtests( 2023-01-11T22:51:00.8343783Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8343940Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8344283Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8344432Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8344802Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8344921Z output = model(*input) 2023-01-11T22:51:00.8345245Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8345377Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8345744Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8345912Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8346256Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8346373Z _lazy_init(state, module) 2023-01-11T22:51:00.8346719Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8346881Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8347325Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8347467Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8347801Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8347917Z return func(*args, **kwargs) 2023-01-11T22:51:00.8348274Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8348375Z p_assert( 2023-01-11T22:51:00.8348700Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8348815Z traceback.print_stack() 2023-01-11T22:51:00.8349044Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8349270Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8349394Z File "", line 1, in 2023-01-11T22:51:00.8349638Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8349767Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8349959Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8350102Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8350306Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8350408Z self.run() 2023-01-11T22:51:00.8350610Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8350752Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8351091Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8351206Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8351562Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8351682Z getattr(self, test_name)() 2023-01-11T22:51:00.8352037Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8352131Z fn() 2023-01-11T22:51:00.8352487Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8352605Z test(self, **param_kwargs) 2023-01-11T22:51:00.8352936Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8353059Z return func(*args, **kwargs) 2023-01-11T22:51:00.8353305Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8353411Z self.run_subtests( 2023-01-11T22:51:00.8353758Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8353916Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8354276Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8354424Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8354788Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8354889Z output = model(*input) 2023-01-11T22:51:00.8355209Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8355344Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8355715Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8355938Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8356301Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8356415Z _lazy_init(state, module) 2023-01-11T22:51:00.8356760Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8356909Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8357294Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8357430Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8357759Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8357877Z return func(*args, **kwargs) 2023-01-11T22:51:00.8358242Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8358346Z p_assert( 2023-01-11T22:51:00.8358722Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8358837Z traceback.print_stack() 2023-01-11T22:51:00.8358963Z File "", line 1, in 2023-01-11T22:51:00.8359166Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8359302Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8359496Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8359639Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8359843Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8359929Z self.run() 2023-01-11T22:51:00.8360120Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8360260Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8360594Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8360726Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8361080Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8361200Z getattr(self, test_name)() 2023-01-11T22:51:00.8361546Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8361627Z fn() 2023-01-11T22:51:00.8361985Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8362105Z test(self, **param_kwargs) 2023-01-11T22:51:00.8362454Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8362581Z return func(*args, **kwargs) 2023-01-11T22:51:00.8362824Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8362935Z self.run_subtests( 2023-01-11T22:51:00.8363281Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8363425Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8363776Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8363924Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8364287Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8364400Z output = model(*input) 2023-01-11T22:51:00.8364717Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8364904Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8365275Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8365430Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8365787Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8365901Z _lazy_init(state, module) 2023-01-11T22:51:00.8366248Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8366409Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8366796Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8366929Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8367263Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8367370Z return func(*args, **kwargs) 2023-01-11T22:51:00.8367780Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8367882Z p_assert( 2023-01-11T22:51:00.8368213Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8368335Z traceback.print_stack() 2023-01-11T22:51:00.8368563Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8368793Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8368916Z File "", line 1, in 2023-01-11T22:51:00.8369106Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8369247Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8369443Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8369591Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8369801Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8369903Z self.run() 2023-01-11T22:51:00.8370100Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8370228Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8370565Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8370691Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8371043Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8371164Z getattr(self, test_name)() 2023-01-11T22:51:00.8371515Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8371606Z fn() 2023-01-11T22:51:00.8371964Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8372069Z test(self, **param_kwargs) 2023-01-11T22:51:00.8372419Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8372535Z return func(*args, **kwargs) 2023-01-11T22:51:00.8372778Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8372886Z self.run_subtests( 2023-01-11T22:51:00.8373224Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8373381Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8373791Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8373926Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8374295Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8374405Z output = model(*input) 2023-01-11T22:51:00.8374722Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8374852Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8375215Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8375379Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8375736Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8375842Z _lazy_init(state, module) 2023-01-11T22:51:00.8376189Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8376395Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8376982Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8377119Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8377456Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8377575Z return func(*args, **kwargs) 2023-01-11T22:51:00.8377942Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8378038Z p_assert( 2023-01-11T22:51:00.8378354Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8378478Z traceback.print_stack() 2023-01-11T22:51:00.8378600Z File "", line 1, in 2023-01-11T22:51:00.8378807Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8378943Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8379138Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8379277Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8379470Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8379566Z self.run() 2023-01-11T22:51:00.8379758Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8379895Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8380228Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8380359Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8380708Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8380827Z getattr(self, test_name)() 2023-01-11T22:51:00.8381165Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8381255Z fn() 2023-01-11T22:51:00.8381608Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8381727Z test(self, **param_kwargs) 2023-01-11T22:51:00.8382076Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8382197Z return func(*args, **kwargs) 2023-01-11T22:51:00.8382441Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8382635Z self.run_subtests( 2023-01-11T22:51:00.8382968Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8383124Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8383480Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8383627Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8383995Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8384112Z output = model(*input) 2023-01-11T22:51:00.8384426Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8384561Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8384915Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8385088Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8385503Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8385623Z _lazy_init(state, module) 2023-01-11T22:51:00.8385968Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8386128Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8386515Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8386649Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8386966Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8387084Z return func(*args, **kwargs) 2023-01-11T22:51:00.8387457Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8387553Z p_assert( 2023-01-11T22:51:00.8387880Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8388001Z traceback.print_stack() 2023-01-11T22:51:00.8388229Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8388452Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8389178Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8389917Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8390652Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8391376Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8392099Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8392931Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8393652Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8394428Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8394564Z File "", line 1, in 2023-01-11T22:51:00.8394771Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8394908Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8395110Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8395255Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8395459Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8395556Z self.run() 2023-01-11T22:51:00.8395740Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8395878Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8396224Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8396350Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8396477Z File "", line 1, in 2023-01-11T22:51:00.8396835Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8396952Z getattr(self, test_name)() 2023-01-11T22:51:00.8397290Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8397380Z fn() 2023-01-11T22:51:00.8397578Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8397709Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8398065Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8398184Z test(self, **param_kwargs) 2023-01-11T22:51:00.8398375Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8398520Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8398860Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8398979Z return func(*args, **kwargs) 2023-01-11T22:51:00.8399184Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8399284Z self.run() 2023-01-11T22:51:00.8399530Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8399633Z self.run_subtests( 2023-01-11T22:51:00.8399830Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8399970Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8400303Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8400514Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8400850Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8400977Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8401335Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8401483Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8401835Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8401953Z getattr(self, test_name)() 2023-01-11T22:51:00.8402308Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8402423Z output = model(*input) 2023-01-11T22:51:00.8402776Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8402869Z fn() 2023-01-11T22:51:00.8403233Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8403374Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8403734Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8403852Z test(self, **param_kwargs) 2023-01-11T22:51:00.8404206Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8404372Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8404721Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8404842Z return func(*args, **kwargs) 2023-01-11T22:51:00.8405203Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8405319Z _lazy_init(state, module) 2023-01-11T22:51:00.8405564Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8405672Z self.run_subtests( 2023-01-11T22:51:00.8406004Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8406163Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8406503Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8406656Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8407046Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8407184Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8407546Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8407693Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8408011Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8408127Z return func(*args, **kwargs) 2023-01-11T22:51:00.8408491Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8408600Z output = model(*input) 2023-01-11T22:51:00.8408967Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8409062Z p_assert( 2023-01-11T22:51:00.8409376Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8409559Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8409881Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8409998Z traceback.print_stack() 2023-01-11T22:51:00.8410367Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8410536Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8410892Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8411006Z _lazy_init(state, module) 2023-01-11T22:51:00.8411348Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8411508Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8411883Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8412023Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8412394Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8412521Z return func(*args, **kwargs) 2023-01-11T22:51:00.8412898Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8412996Z p_assert( 2023-01-11T22:51:00.8413322Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8413442Z traceback.print_stack() 2023-01-11T22:51:00.8413662Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8413887Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8414118Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8414341Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8414565Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8414782Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8415003Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8415224Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8415432Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8415651Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8415868Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8416085Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8416309Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8416523Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8416929Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8417144Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8417365Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8417571Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8417788Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8418001Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8418309Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8418531Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8418745Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8418960Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8419713Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8420446Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8421232Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8421978Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8422703Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8423432Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8424155Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8424876Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8425598Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8426317Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8427037Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8427817Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8428536Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8429238Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8430032Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8430759Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8431476Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8432203Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8432917Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8433636Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8433876Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8434102Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8434330Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8434556Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8434773Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8434993Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8435209Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8435428Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8435690Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8435911Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8436125Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8436343Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8436555Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8436773Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8436988Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8437204Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8437412Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8437630Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8437886Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8438110Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8438324Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8438547Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8438760Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8438973Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8440001Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:259: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:00.8440110Z world_indices[ 2023-01-11T22:51:00.8440825Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8441552Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8442283Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8443005Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8443719Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8444495Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8444604Z dist init r=1, world=2 2023-01-11T22:51:00.8444926Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8445236Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8445539Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8445840Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8446177Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8446480Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8446775Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8447065Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8447360Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8447647Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8447943Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8448236Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8448341Z dist init r=0, world=2 2023-01-11T22:51:00.8448656Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8448965Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8449270Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8449564Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8449855Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8450149Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8450440Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8450792Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8451071Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8451362Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8451656Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8451950Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8452051Z ok (6.514s) 2023-01-11T22:51:00.8452440Z test_nested_wrapped_model_offload_true_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95273 2023-01-11T22:51:00.8452661Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95274 2023-01-11T22:51:00.8453037Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.8453209Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.8453571Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.8453755Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.8454113Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.8454286Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.8454653Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.8454839Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.8455079Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.8455315Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.8455708Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.8456080Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.8456304Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.8456527Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.8456962Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8457192Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8458209Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.8458322Z warnings.warn( 2023-01-11T22:51:00.8459325Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.8459520Z warnings.warn( 2023-01-11T22:51:00.8459646Z File "", line 1, in 2023-01-11T22:51:00.8459850Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8459975Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8460172Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8460319Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8460525Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8460624Z self.run() 2023-01-11T22:51:00.8460818Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8460963Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8461346Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8461482Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8461840Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8461958Z getattr(self, test_name)() 2023-01-11T22:51:00.8462311Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8462407Z fn() 2023-01-11T22:51:00.8462768Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8462887Z test(self, **param_kwargs) 2023-01-11T22:51:00.8463225Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8463352Z return func(*args, **kwargs) 2023-01-11T22:51:00.8463600Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8463712Z self.run_subtests( 2023-01-11T22:51:00.8464056Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8464212Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8464567Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8464713Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8465068Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8465184Z output = model(*input) 2023-01-11T22:51:00.8465507Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8465642Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8466012Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8466180Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8466541Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8466656Z _lazy_init(state, module) 2023-01-11T22:51:00.8466989Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8467156Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8467547Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8467734Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8468071Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8468194Z return func(*args, **kwargs) 2023-01-11T22:51:00.8468567Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8468664Z p_assert( 2023-01-11T22:51:00.8468997Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8469106Z traceback.print_stack() 2023-01-11T22:51:00.8469226Z File "", line 1, in 2023-01-11T22:51:00.8469429Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8469568Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8469764Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8469911Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8470115Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8470202Z self.run() 2023-01-11T22:51:00.8470446Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8470596Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8470935Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8471066Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8471421Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8471536Z getattr(self, test_name)() 2023-01-11T22:51:00.8471890Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8471970Z fn() 2023-01-11T22:51:00.8472324Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8472446Z test(self, **param_kwargs) 2023-01-11T22:51:00.8472796Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8472918Z return func(*args, **kwargs) 2023-01-11T22:51:00.8473164Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8473273Z self.run_subtests( 2023-01-11T22:51:00.8473617Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8473760Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8474115Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8474260Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8474630Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8474742Z output = model(*input) 2023-01-11T22:51:00.8475065Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8475197Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8475566Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8475723Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8476081Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8476193Z _lazy_init(state, module) 2023-01-11T22:51:00.8476542Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8476758Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8477157Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8477295Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8477629Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8477737Z return func(*args, **kwargs) 2023-01-11T22:51:00.8478109Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8478206Z p_assert( 2023-01-11T22:51:00.8478538Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8478658Z traceback.print_stack() 2023-01-11T22:51:00.8478888Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8479117Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8479243Z File "", line 1, in 2023-01-11T22:51:00.8479486Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8479629Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8479825Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8479970Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8480174Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8480272Z self.run() 2023-01-11T22:51:00.8480463Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8480590Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8480927Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8481061Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8481409Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8481532Z getattr(self, test_name)() 2023-01-11T22:51:00.8481880Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8481973Z fn() 2023-01-11T22:51:00.8482324Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8482429Z test(self, **param_kwargs) 2023-01-11T22:51:00.8482777Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8482894Z return func(*args, **kwargs) 2023-01-11T22:51:00.8483142Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8483256Z self.run_subtests( 2023-01-11T22:51:00.8483597Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8483758Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8484119Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8484254Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8484620Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8484735Z output = model(*input) 2023-01-11T22:51:00.8485051Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8485183Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8485553Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8485778Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8486144Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8486248Z _lazy_init(state, module) 2023-01-11T22:51:00.8486596Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8486756Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8487147Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8487285Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8487616Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8487738Z return func(*args, **kwargs) 2023-01-11T22:51:00.8488110Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8488195Z p_assert( 2023-01-11T22:51:00.8488569Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8488699Z traceback.print_stack() 2023-01-11T22:51:00.8488826Z File "", line 1, in 2023-01-11T22:51:00.8489031Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8489167Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8489365Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8489513Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8489706Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8489802Z self.run() 2023-01-11T22:51:00.8489997Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8490146Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8490483Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8490612Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8490968Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8491087Z getattr(self, test_name)() 2023-01-11T22:51:00.8491426Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8491517Z fn() 2023-01-11T22:51:00.8491873Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8491990Z test(self, **param_kwargs) 2023-01-11T22:51:00.8492337Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8492514Z return func(*args, **kwargs) 2023-01-11T22:51:00.8492766Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8492863Z self.run_subtests( 2023-01-11T22:51:00.8493210Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8493367Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8493724Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8493869Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8494235Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8494348Z output = model(*input) 2023-01-11T22:51:00.8494725Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8494857Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8495216Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8495382Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8495738Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8495855Z _lazy_init(state, module) 2023-01-11T22:51:00.8496201Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8496364Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8496993Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8497141Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8497468Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8497662Z return func(*args, **kwargs) 2023-01-11T22:51:00.8498046Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8498145Z p_assert( 2023-01-11T22:51:00.8498471Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8498590Z traceback.print_stack() 2023-01-11T22:51:00.8498821Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8499048Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8499160Z File "", line 1, in 2023-01-11T22:51:00.8499363Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8499506Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8499702Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8499851Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8500056Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8500157Z self.run() 2023-01-11T22:51:00.8500338Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8500481Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8500813Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8500941Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8501295Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8501418Z getattr(self, test_name)() 2023-01-11T22:51:00.8501764Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8501858Z fn() 2023-01-11T22:51:00.8502203Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8502320Z test(self, **param_kwargs) 2023-01-11T22:51:00.8502668Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8502790Z return func(*args, **kwargs) 2023-01-11T22:51:00.8503033Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8503142Z self.run_subtests( 2023-01-11T22:51:00.8503486Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8503643Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8504063Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8504215Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8504585Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8504695Z output = model(*input) 2023-01-11T22:51:00.8505013Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8505148Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8505518Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8505684Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8506031Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8506144Z _lazy_init(state, module) 2023-01-11T22:51:00.8506533Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8506698Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8507090Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8507228Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8507557Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8507678Z return func(*args, **kwargs) 2023-01-11T22:51:00.8508035Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8508134Z p_assert( 2023-01-11T22:51:00.8508459Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8508586Z traceback.print_stack() 2023-01-11T22:51:00.8508708Z File "", line 1, in 2023-01-11T22:51:00.8508908Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8509047Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8509242Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8509374Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8509579Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8509678Z self.run() 2023-01-11T22:51:00.8509876Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8510018Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8510348Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8510479Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8510823Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8510942Z getattr(self, test_name)() 2023-01-11T22:51:00.8511292Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8511386Z fn() 2023-01-11T22:51:00.8511743Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8511863Z test(self, **param_kwargs) 2023-01-11T22:51:00.8512212Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8512330Z return func(*args, **kwargs) 2023-01-11T22:51:00.8512560Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8512722Z self.run_subtests( 2023-01-11T22:51:00.8513071Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8513230Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8513591Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8513739Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8514104Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8514218Z output = model(*input) 2023-01-11T22:51:00.8514523Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8514658Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8515029Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8515201Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8515620Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8515747Z _lazy_init(state, module) 2023-01-11T22:51:00.8516094Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8516254Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8516643Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8516769Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8517100Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8517224Z return func(*args, **kwargs) 2023-01-11T22:51:00.8517601Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8517697Z p_assert( 2023-01-11T22:51:00.8518025Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8518145Z traceback.print_stack() 2023-01-11T22:51:00.8518362Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8518589Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8519329Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8520069Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8520196Z File "", line 1, in 2023-01-11T22:51:00.8520398Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8520537Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8520733Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8520877Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8521080Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8521166Z self.run() 2023-01-11T22:51:00.8521360Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8521558Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8521893Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8522024Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8522378Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8522498Z getattr(self, test_name)() 2023-01-11T22:51:00.8522850Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8522930Z fn() 2023-01-11T22:51:00.8523285Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8523403Z test(self, **param_kwargs) 2023-01-11T22:51:00.8523754Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8523874Z return func(*args, **kwargs) 2023-01-11T22:51:00.8524120Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8524270Z self.run_subtests( 2023-01-11T22:51:00.8524627Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8524770Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8525124Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8525274Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8525644Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8525754Z output = model(*input) 2023-01-11T22:51:00.8526073Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8526209Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8526578Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8526733Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8527087Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8527203Z _lazy_init(state, module) 2023-01-11T22:51:00.8527550Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8527712Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8528098Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8528235Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8528571Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8528679Z return func(*args, **kwargs) 2023-01-11T22:51:00.8529050Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8529147Z p_assert( 2023-01-11T22:51:00.8529475Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8529596Z traceback.print_stack() 2023-01-11T22:51:00.8529720Z File "", line 1, in 2023-01-11T22:51:00.8529919Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8530054Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8530236Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8530382Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8530642Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8530742Z self.run() 2023-01-11T22:51:00.8530942Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8531082Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8531417Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8531532Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8531882Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8531998Z getattr(self, test_name)() 2023-01-11T22:51:00.8532349Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8532443Z fn() 2023-01-11T22:51:00.8532804Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8532927Z test(self, **param_kwargs) 2023-01-11T22:51:00.8533320Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8533435Z return func(*args, **kwargs) 2023-01-11T22:51:00.8533681Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8533785Z self.run_subtests( 2023-01-11T22:51:00.8534131Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8534286Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8534643Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8534788Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8535152Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8535258Z output = model(*input) 2023-01-11T22:51:00.8535580Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8535713Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8536081Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8536249Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8536793Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8536918Z _lazy_init(state, module) 2023-01-11T22:51:00.8537268Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8537417Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8537810Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8537949Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8538282Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8538400Z return func(*args, **kwargs) 2023-01-11T22:51:00.8538764Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8538857Z p_assert( 2023-01-11T22:51:00.8539184Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8539293Z traceback.print_stack() 2023-01-11T22:51:00.8539519Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8539745Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8539949Z File "", line 1, in 2023-01-11T22:51:00.8540159Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8540298Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8540492Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8540638Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8540831Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8540932Z self.run() 2023-01-11T22:51:00.8541127Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8541267Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8541603Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8541732Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8542090Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8542208Z getattr(self, test_name)() 2023-01-11T22:51:00.8542605Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8542706Z fn() 2023-01-11T22:51:00.8543063Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8543181Z test(self, **param_kwargs) 2023-01-11T22:51:00.8543529Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8543649Z return func(*args, **kwargs) 2023-01-11T22:51:00.8543897Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8543993Z self.run_subtests( 2023-01-11T22:51:00.8544339Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8544493Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8544853Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8545003Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8545376Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8545488Z output = model(*input) 2023-01-11T22:51:00.8545803Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8545924Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8546292Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8546457Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8546817Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8546936Z _lazy_init(state, module) 2023-01-11T22:51:00.8547281Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8547442Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8547831Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8547967Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8548284Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8548407Z return func(*args, **kwargs) 2023-01-11T22:51:00.8548777Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8548930Z p_assert( 2023-01-11T22:51:00.8549265Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8549388Z traceback.print_stack() 2023-01-11T22:51:00.8549511Z File "", line 1, in 2023-01-11T22:51:00.8549700Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8549838Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8550033Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8550182Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8550389Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8550490Z self.run() 2023-01-11T22:51:00.8550684Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8550827Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8551144Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8551319Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8551686Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8551805Z getattr(self, test_name)() 2023-01-11T22:51:00.8552160Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8552253Z fn() 2023-01-11T22:51:00.8552608Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8552724Z test(self, **param_kwargs) 2023-01-11T22:51:00.8553060Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8553186Z return func(*args, **kwargs) 2023-01-11T22:51:00.8553430Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8553539Z self.run_subtests( 2023-01-11T22:51:00.8553882Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8554037Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8554392Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8554539Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8554892Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8555007Z output = model(*input) 2023-01-11T22:51:00.8555326Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8555459Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8555830Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8555994Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8556352Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8556465Z _lazy_init(state, module) 2023-01-11T22:51:00.8556798Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8556956Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8557344Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8557479Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8557871Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8557989Z return func(*args, **kwargs) 2023-01-11T22:51:00.8558362Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8558458Z p_assert( 2023-01-11T22:51:00.8558773Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8558897Z traceback.print_stack() 2023-01-11T22:51:00.8559128Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8559357Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8559480Z File "", line 1, in 2023-01-11T22:51:00.8559681Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8559821Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8560017Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8560194Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8560406Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8560504Z self.run() 2023-01-11T22:51:00.8560700Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8560834Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8561167Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8561290Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8561628Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8561748Z getattr(self, test_name)() 2023-01-11T22:51:00.8562106Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8562200Z fn() 2023-01-11T22:51:00.8562559Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8562672Z test(self, **param_kwargs) 2023-01-11T22:51:00.8563017Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8563134Z return func(*args, **kwargs) 2023-01-11T22:51:00.8563365Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8563475Z self.run_subtests( 2023-01-11T22:51:00.8563821Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8563978Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8564341Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8564487Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8564858Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8564969Z output = model(*input) 2023-01-11T22:51:00.8565276Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8565412Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8565776Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8565945Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8566303Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8566500Z _lazy_init(state, module) 2023-01-11T22:51:00.8566847Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8567010Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8567399Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8567525Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8567854Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8567970Z return func(*args, **kwargs) 2023-01-11T22:51:00.8568334Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8568431Z p_assert( 2023-01-11T22:51:00.8568759Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8568883Z traceback.print_stack() 2023-01-11T22:51:00.8568994Z File "", line 1, in 2023-01-11T22:51:00.8569246Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8569391Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8569584Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8569727Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8569932Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8570034Z self.run() 2023-01-11T22:51:00.8570229Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8570357Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8570692Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8570826Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8571181Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8571303Z getattr(self, test_name)() 2023-01-11T22:51:00.8571659Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8571753Z fn() 2023-01-11T22:51:00.8572111Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8572216Z test(self, **param_kwargs) 2023-01-11T22:51:00.8572560Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8572680Z return func(*args, **kwargs) 2023-01-11T22:51:00.8572925Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8573037Z self.run_subtests( 2023-01-11T22:51:00.8573383Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8573545Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8573905Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8574040Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8574406Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8574522Z output = model(*input) 2023-01-11T22:51:00.8574835Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8574970Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8575344Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8575570Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8575937Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8576040Z _lazy_init(state, module) 2023-01-11T22:51:00.8576384Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8576712Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8577125Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8577262Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8577594Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8577713Z return func(*args, **kwargs) 2023-01-11T22:51:00.8578085Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8578177Z p_assert( 2023-01-11T22:51:00.8578587Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8578719Z traceback.print_stack() 2023-01-11T22:51:00.8578952Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8579178Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8579301Z File "", line 1, in 2023-01-11T22:51:00.8579504Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8579643Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8579824Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8579968Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8580175Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8580275Z self.run() 2023-01-11T22:51:00.8580475Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8580612Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8581011Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8581127Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8581482Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8581602Z getattr(self, test_name)() 2023-01-11T22:51:00.8581949Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8582040Z fn() 2023-01-11T22:51:00.8582395Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8582517Z test(self, **param_kwargs) 2023-01-11T22:51:00.8582868Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8582975Z return func(*args, **kwargs) 2023-01-11T22:51:00.8583219Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8583327Z self.run_subtests( 2023-01-11T22:51:00.8583669Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8583824Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8584179Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8584324Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8584688Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8584865Z output = model(*input) 2023-01-11T22:51:00.8585188Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8585318Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8585688Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8585855Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8586212Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8586328Z _lazy_init(state, module) 2023-01-11T22:51:00.8586674Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8586822Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8587215Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8587398Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8587739Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8587862Z return func(*args, **kwargs) 2023-01-11T22:51:00.8588232Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8588330Z p_assert( 2023-01-11T22:51:00.8588657Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8588765Z traceback.print_stack() 2023-01-11T22:51:00.8588886Z File "", line 1, in 2023-01-11T22:51:00.8589084Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8589224Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8589422Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8589571Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8589778Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8589879Z self.run() 2023-01-11T22:51:00.8590062Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8590200Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8590532Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8590661Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8591017Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8591133Z getattr(self, test_name)() 2023-01-11T22:51:00.8591486Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8591579Z fn() 2023-01-11T22:51:00.8591924Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8592041Z test(self, **param_kwargs) 2023-01-11T22:51:00.8592391Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8592554Z return func(*args, **kwargs) 2023-01-11T22:51:00.8592801Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8592912Z self.run_subtests( 2023-01-11T22:51:00.8593260Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8593403Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8593821Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8593969Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8594336Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8594450Z output = model(*input) 2023-01-11T22:51:00.8594769Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8594905Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8595273Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8595440Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8595786Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8595904Z _lazy_init(state, module) 2023-01-11T22:51:00.8596247Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8596455Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8596856Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8596994Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8597327Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8597448Z return func(*args, **kwargs) 2023-01-11T22:51:00.8597804Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8597904Z p_assert( 2023-01-11T22:51:00.8598232Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8598357Z traceback.print_stack() 2023-01-11T22:51:00.8598588Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8598820Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8599559Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8600292Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8600422Z File "", line 1, in 2023-01-11T22:51:00.8600614Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8600753Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8600951Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8601094Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8601302Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8601401Z self.run() 2023-01-11T22:51:00.8601594Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8601731Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8602053Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8602177Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8602530Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8602702Z getattr(self, test_name)() 2023-01-11T22:51:00.8603058Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8603150Z fn() 2023-01-11T22:51:00.8603506Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8603627Z test(self, **param_kwargs) 2023-01-11T22:51:00.8603962Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8604079Z return func(*args, **kwargs) 2023-01-11T22:51:00.8604322Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8604426Z self.run_subtests( 2023-01-11T22:51:00.8604767Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8604928Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8605332Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8605484Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8605843Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8605956Z output = model(*input) 2023-01-11T22:51:00.8606278Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8606409Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8606780Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8606950Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8607319Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8607436Z _lazy_init(state, module) 2023-01-11T22:51:00.8607770Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8607935Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8608329Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8608468Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8608798Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8608919Z return func(*args, **kwargs) 2023-01-11T22:51:00.8609286Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8609385Z p_assert( 2023-01-11T22:51:00.8609700Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8609821Z traceback.print_stack() 2023-01-11T22:51:00.8609943Z File "", line 1, in 2023-01-11T22:51:00.8610140Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8610274Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8610470Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8610610Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8610813Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8610900Z self.run() 2023-01-11T22:51:00.8611097Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8611234Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8611622Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8611751Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8612109Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8612227Z getattr(self, test_name)() 2023-01-11T22:51:00.8612564Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8612658Z fn() 2023-01-11T22:51:00.8613016Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8613132Z test(self, **param_kwargs) 2023-01-11T22:51:00.8613480Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8613598Z return func(*args, **kwargs) 2023-01-11T22:51:00.8613844Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8613952Z self.run_subtests( 2023-01-11T22:51:00.8614326Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8614487Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8614843Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8614991Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8615361Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8615478Z output = model(*input) 2023-01-11T22:51:00.8615798Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8615936Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8616292Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8616467Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8617079Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8617200Z _lazy_init(state, module) 2023-01-11T22:51:00.8617550Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8617714Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8618106Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8618242Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8618569Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8618682Z return func(*args, **kwargs) 2023-01-11T22:51:00.8619053Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8619150Z p_assert( 2023-01-11T22:51:00.8619479Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8619594Z traceback.print_stack() 2023-01-11T22:51:00.8619822Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8620045Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8620156Z File "", line 1, in 2023-01-11T22:51:00.8620356Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8620493Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8620769Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8620909Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8621115Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8621212Z self.run() 2023-01-11T22:51:00.8621407Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8621534Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8621872Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8621996Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8622349Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8622468Z getattr(self, test_name)() 2023-01-11T22:51:00.8622821Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8622917Z fn() 2023-01-11T22:51:00.8623330Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8623445Z test(self, **param_kwargs) 2023-01-11T22:51:00.8623797Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8623918Z return func(*args, **kwargs) 2023-01-11T22:51:00.8624164Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8624273Z self.run_subtests( 2023-01-11T22:51:00.8624612Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8624769Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8625122Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8625260Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8625631Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8625742Z output = model(*input) 2023-01-11T22:51:00.8626059Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8626192Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8626559Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8626724Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8627080Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8627182Z _lazy_init(state, module) 2023-01-11T22:51:00.8627524Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8627685Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8628075Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8628216Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8628546Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8628667Z return func(*args, **kwargs) 2023-01-11T22:51:00.8629032Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8629117Z p_assert( 2023-01-11T22:51:00.8629445Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8629564Z traceback.print_stack() 2023-01-11T22:51:00.8629743Z File "", line 1, in 2023-01-11T22:51:00.8629945Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8630085Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8630281Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8630413Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8630618Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8630715Z self.run() 2023-01-11T22:51:00.8630906Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8631044Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8631376Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8631500Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8631854Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8631960Z getattr(self, test_name)() 2023-01-11T22:51:00.8632357Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8632459Z fn() 2023-01-11T22:51:00.8632817Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8632934Z test(self, **param_kwargs) 2023-01-11T22:51:00.8633279Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8633395Z return func(*args, **kwargs) 2023-01-11T22:51:00.8633642Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8633737Z self.run_subtests( 2023-01-11T22:51:00.8634075Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8634233Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8634593Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8634735Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8635104Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8635216Z output = model(*input) 2023-01-11T22:51:00.8635531Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8635650Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8636017Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8636181Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8636536Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8636651Z _lazy_init(state, module) 2023-01-11T22:51:00.8636991Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8637150Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8637541Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8637665Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8637995Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8638114Z return func(*args, **kwargs) 2023-01-11T22:51:00.8638486Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8638636Z p_assert( 2023-01-11T22:51:00.8638968Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8639094Z traceback.print_stack() 2023-01-11T22:51:00.8639324Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8639540Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8639663Z File "", line 1, in 2023-01-11T22:51:00.8639866Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8640000Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8640193Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8640336Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8640537Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8640636Z self.run() 2023-01-11T22:51:00.8640816Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8641006Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8641347Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8641475Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8641825Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8641942Z getattr(self, test_name)() 2023-01-11T22:51:00.8642296Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8642390Z fn() 2023-01-11T22:51:00.8642733Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8642856Z test(self, **param_kwargs) 2023-01-11T22:51:00.8643203Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8643325Z return func(*args, **kwargs) 2023-01-11T22:51:00.8643570Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8643675Z self.run_subtests( 2023-01-11T22:51:00.8644019Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8644164Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8644521Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8644661Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8645028Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8645145Z output = model(*input) 2023-01-11T22:51:00.8645458Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8645589Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8645956Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8646120Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8646464Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8646578Z _lazy_init(state, module) 2023-01-11T22:51:00.8646917Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8647079Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8647539Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8647675Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8648008Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8648126Z return func(*args, **kwargs) 2023-01-11T22:51:00.8648483Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8648581Z p_assert( 2023-01-11T22:51:00.8648908Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8649027Z traceback.print_stack() 2023-01-11T22:51:00.8649152Z File "", line 1, in 2023-01-11T22:51:00.8649349Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8649485Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8649672Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8649817Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8650065Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8650172Z self.run() 2023-01-11T22:51:00.8650369Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8650512Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8650847Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8650973Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8651312Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8651432Z getattr(self, test_name)() 2023-01-11T22:51:00.8651786Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8651882Z fn() 2023-01-11T22:51:00.8652242Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8652360Z test(self, **param_kwargs) 2023-01-11T22:51:00.8652708Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8652824Z return func(*args, **kwargs) 2023-01-11T22:51:00.8653056Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8653163Z self.run_subtests( 2023-01-11T22:51:00.8653503Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8653659Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8654016Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8654165Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8654530Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8654643Z output = model(*input) 2023-01-11T22:51:00.8654947Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8655078Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8655448Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8655616Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8655977Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8656095Z _lazy_init(state, module) 2023-01-11T22:51:00.8656512Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8656855Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8657249Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8657386Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8657713Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8657831Z return func(*args, **kwargs) 2023-01-11T22:51:00.8658193Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8658288Z p_assert( 2023-01-11T22:51:00.8658617Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8658742Z traceback.print_stack() 2023-01-11T22:51:00.8658960Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8659258Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8659396Z File "", line 1, in 2023-01-11T22:51:00.8659604Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8659741Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8659935Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8660081Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8660286Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8660373Z self.run() 2023-01-11T22:51:00.8660568Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8660708Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8661048Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8661174Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8661533Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8661652Z getattr(self, test_name)() 2023-01-11T22:51:00.8661991Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8662087Z fn() 2023-01-11T22:51:00.8662441Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8662556Z test(self, **param_kwargs) 2023-01-11T22:51:00.8662902Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8663019Z return func(*args, **kwargs) 2023-01-11T22:51:00.8663262Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8663367Z self.run_subtests( 2023-01-11T22:51:00.8663698Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8663854Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8664209Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8664351Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8664715Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8664825Z output = model(*input) 2023-01-11T22:51:00.8665139Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8665340Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8665699Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8665871Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8666229Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8666343Z _lazy_init(state, module) 2023-01-11T22:51:00.8666687Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8666847Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8667236Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8667372Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8667699Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8667811Z return func(*args, **kwargs) 2023-01-11T22:51:00.8668220Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8668321Z p_assert( 2023-01-11T22:51:00.8668653Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8668772Z traceback.print_stack() 2023-01-11T22:51:00.8668896Z File "", line 1, in 2023-01-11T22:51:00.8669095Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8669220Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8669413Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8669551Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8669758Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8669856Z self.run() 2023-01-11T22:51:00.8670045Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8670183Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8670512Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8670627Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8670977Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8671093Z getattr(self, test_name)() 2023-01-11T22:51:00.8671444Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8671535Z fn() 2023-01-11T22:51:00.8671885Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8672004Z test(self, **param_kwargs) 2023-01-11T22:51:00.8672355Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8672466Z return func(*args, **kwargs) 2023-01-11T22:51:00.8672711Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8672819Z self.run_subtests( 2023-01-11T22:51:00.8673155Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8673311Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8673665Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8673813Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8674182Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8674341Z output = model(*input) 2023-01-11T22:51:00.8674663Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8674790Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8675157Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8675325Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8675682Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8675796Z _lazy_init(state, module) 2023-01-11T22:51:00.8676142Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8676292Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8676689Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8676825Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8677201Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8677327Z return func(*args, **kwargs) 2023-01-11T22:51:00.8677699Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8677795Z p_assert( 2023-01-11T22:51:00.8678124Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8678233Z traceback.print_stack() 2023-01-11T22:51:00.8678463Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8678687Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8679434Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8680162Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8680892Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8681629Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8682352Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8683079Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8683863Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8684587Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8684715Z File "", line 1, in 2023-01-11T22:51:00.8684922Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8685059Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8685260Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8685396Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8685604Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8685747Z self.run() 2023-01-11T22:51:00.8685954Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8686096Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8686433Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8686559Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8686914Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8687020Z getattr(self, test_name)() 2023-01-11T22:51:00.8687372Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8687470Z fn() 2023-01-11T22:51:00.8687828Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8687945Z test(self, **param_kwargs) 2023-01-11T22:51:00.8688300Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8688422Z return func(*args, **kwargs) 2023-01-11T22:51:00.8688654Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8688765Z self.run_subtests( 2023-01-11T22:51:00.8689113Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8689269Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8689631Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8689778Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8690145Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8690264Z output = model(*input) 2023-01-11T22:51:00.8690584Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8690705Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8691073Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8691238Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8691597Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8691713Z _lazy_init(state, module) 2023-01-11T22:51:00.8692059Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8692276Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8692723Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8692849Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8693183Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8693303Z return func(*args, **kwargs) 2023-01-11T22:51:00.8693672Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8693769Z p_assert( 2023-01-11T22:51:00.8694095Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8694213Z traceback.print_stack() 2023-01-11T22:51:00.8694334Z File "", line 1, in 2023-01-11T22:51:00.8694528Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8694664Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8694909Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8695065Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8695268Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8695368Z self.run() 2023-01-11T22:51:00.8695561Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8695688Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8696023Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8696148Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8696507Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8696806Z getattr(self, test_name)() 2023-01-11T22:51:00.8697174Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8697266Z fn() 2023-01-11T22:51:00.8697622Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8697726Z test(self, **param_kwargs) 2023-01-11T22:51:00.8698073Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8698191Z return func(*args, **kwargs) 2023-01-11T22:51:00.8698434Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:51:00.8698541Z self.run_subtests( 2023-01-11T22:51:00.8698881Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8699041Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8699402Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8699535Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8699898Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8700009Z output = model(*input) 2023-01-11T22:51:00.8700322Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8700454Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8700822Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8700987Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8701437Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8701541Z _lazy_init(state, module) 2023-01-11T22:51:00.8701889Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8702050Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8702440Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8702575Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8702908Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8703026Z return func(*args, **kwargs) 2023-01-11T22:51:00.8703389Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8703479Z p_assert( 2023-01-11T22:51:00.8703809Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8703925Z traceback.print_stack() 2023-01-11T22:51:00.8704261Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8704500Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8704724Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8704948Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8705165Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8705373Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8705593Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8705815Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8706034Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8706249Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8706466Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8706682Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8706898Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8707113Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8707316Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8707527Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8707746Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8707964Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8708176Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8708392Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8708606Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8708820Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8709024Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8709238Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8709990Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8710780Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8711506Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8712272Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8713002Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8713726Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8714445Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8715167Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8715883Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8716596Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8717319Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8718033Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8718748Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8719516Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8720233Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8720988Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8721716Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8722431Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8723146Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8723865Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8724096Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8724321Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8724545Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8724769Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8724997Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8725211Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8725432Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8725652Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8725870Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8726087Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8726302Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8726523Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8726740Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8727002Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8727224Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8727442Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8727656Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8727873Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8728089Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8728308Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8728521Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8728730Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8728944Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8729201Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.8730236Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:259: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:00.8730344Z world_indices[ 2023-01-11T22:51:00.8731067Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8731802Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8732524Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8733243Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8733962Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8734675Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8734784Z dist init r=1, world=2 2023-01-11T22:51:00.8735106Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8735466Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8735765Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8736060Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8736359Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8736954Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8737269Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8737633Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8737938Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8738236Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8738530Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8738825Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.8738934Z dist init r=0, world=2 2023-01-11T22:51:00.8739252Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8739559Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8739863Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8740146Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8740441Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8740743Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8741044Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8741336Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8741627Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8741919Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8742286Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8742578Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.8742673Z ok (6.314s) 2023-01-11T22:51:00.8743039Z test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95356 2023-01-11T22:51:00.8743241Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95357 2023-01-11T22:51:00.8743624Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.8743792Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.8744171Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.8744406Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.8744780Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.8744947Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.8745309Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.8745496Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.8745723Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.8745960Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.8746355Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.8746742Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.8746964Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.8747183Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.8748187Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.8748298Z warnings.warn( 2023-01-11T22:51:00.8749296Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.8749402Z warnings.warn( 2023-01-11T22:51:00.8750139Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8750863Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8751644Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8752369Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8753135Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8753865Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8754586Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8755307Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8756036Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8756756Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8757474Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8758194Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8758909Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8759628Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8760385Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8761100Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8761869Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8762602Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8763317Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8764034Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8764752Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8765465Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8766181Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8766900Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8767613Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8768328Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8769097Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8769811Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8770522Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8771280Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8772004Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8772718Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8773438Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8774145Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8774855Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8775571Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8776283Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8777212Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8778023Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8778737Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8779448Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8780213Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8780937Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8781651Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8782368Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8783080Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8783790Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8784502Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8785213Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8785920Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8786687Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8787398Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8788108Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8788860Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8789579Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8790291Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8791010Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8791724Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8792434Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8793208Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8793920Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8794636Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8795406Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8796117Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8796821Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8797575Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8798293Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8799003Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8799715Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8800429Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8801133Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8801846Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8802556Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8803265Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8804033Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8804739Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8805450Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8806203Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8806921Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8807631Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8808351Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8809062Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8809772Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8810488Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8811196Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8811909Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8812677Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8813393Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8814098Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8814850Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8815568Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8816277Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8817224Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8817941Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8818649Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8819362Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8820071Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8820783Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8821589Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8822304Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8822413Z dist init r=0, world=2 2023-01-11T22:51:00.8822506Z dist init r=1, world=2 2023-01-11T22:51:00.8822600Z ok (5.212s) 2023-01-11T22:51:00.8822968Z test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95439 2023-01-11T22:51:00.8823265Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95440 2023-01-11T22:51:00.8823643Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.8823815Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.8824181Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.8824363Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.8824709Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.8824880Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.8825255Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.8825438Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.8825685Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.8825925Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.8826318Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.8826703Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.8826927Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.8827134Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.8828148Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.8828262Z warnings.warn( 2023-01-11T22:51:00.8829260Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.8829425Z warnings.warn( 2023-01-11T22:51:00.8830164Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8830895Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8831620Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8832389Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8833121Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8833835Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8834558Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8835274Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8835986Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8836705Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8837423Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8838139Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8838949Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8839662Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8840377Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8841140Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8841863Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8842573Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8843291Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8844004Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8844717Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8845432Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8846146Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8846856Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8847622Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8848338Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8849048Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8849800Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8849918Z dist init r=1, world=2 2023-01-11T22:51:00.8850021Z dist init r=0, world=2 2023-01-11T22:51:00.8850116Z ok (5.412s) 2023-01-11T22:51:00.8850483Z test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95522 2023-01-11T22:51:00.8850696Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95523 2023-01-11T22:51:00.8851066Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.8851245Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.8851619Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.8851805Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.8852167Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.8852337Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.8852709Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.8852877Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.8853118Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.8853353Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.8853748Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.8854133Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.8854354Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.8854571Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.8855574Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.8855732Z warnings.warn( 2023-01-11T22:51:00.8857019Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.8857132Z warnings.warn( 2023-01-11T22:51:00.8857873Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8858674Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8859424Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8860151Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8860865Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8861593Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8862315Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8863036Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8863762Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8864479Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8865199Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8865982Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8866700Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8867458Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8868185Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8868901Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8869623Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8870341Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8871054Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8871771Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8872486Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8873202Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8873915Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8874677Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8875386Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8876134Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8876858Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8877570Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8877677Z dist init r=0, world=2 2023-01-11T22:51:00.8877787Z dist init r=1, world=2 2023-01-11T22:51:00.8877884Z ok (5.413s) 2023-01-11T22:51:00.8878258Z test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95605 2023-01-11T22:51:00.8878460Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95606 2023-01-11T22:51:00.8878823Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.8878993Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.8879363Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.8879547Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.8879906Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.8880085Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.8880464Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.8880646Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.8880928Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.8881173Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.8881574Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.8881966Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.8882190Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.8882467Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.8883482Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.8883591Z warnings.warn( 2023-01-11T22:51:00.8884595Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.8884706Z warnings.warn( 2023-01-11T22:51:00.8884886Z File "", line 1, in 2023-01-11T22:51:00.8885088Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8885227Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8885429Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8885578Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8885786Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8885888Z self.run() 2023-01-11T22:51:00.8886086Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8886214Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8886554Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8886689Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8887043Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8887162Z getattr(self, test_name)() 2023-01-11T22:51:00.8887515Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8887606Z fn() 2023-01-11T22:51:00.8887965Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8888070Z test(self, **param_kwargs) 2023-01-11T22:51:00.8888423Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8888543Z return func(*args, **kwargs) 2023-01-11T22:51:00.8888828Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.8888943Z self.run_subtests( 2023-01-11T22:51:00.8889295Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8889449Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8889802Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8889935Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8890304Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8890418Z output = model(*input) 2023-01-11T22:51:00.8890737Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8890871Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8891294Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8891468Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8891831Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8891946Z _lazy_init(state, module) 2023-01-11T22:51:00.8892280Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8892441Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8892882Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8893021Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8893360Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8893485Z return func(*args, **kwargs) 2023-01-11T22:51:00.8893905Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8894013Z p_assert( 2023-01-11T22:51:00.8894334Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8894454Z traceback.print_stack() 2023-01-11T22:51:00.8894579Z File "", line 1, in 2023-01-11T22:51:00.8894783Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8894919Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8895115Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8895261Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8895452Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8895558Z self.run() 2023-01-11T22:51:00.8895755Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8895898Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8896234Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8896363Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8896948Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8897078Z getattr(self, test_name)() 2023-01-11T22:51:00.8897427Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8897519Z fn() 2023-01-11T22:51:00.8897880Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8898001Z test(self, **param_kwargs) 2023-01-11T22:51:00.8898357Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8898481Z return func(*args, **kwargs) 2023-01-11T22:51:00.8898772Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.8898883Z self.run_subtests( 2023-01-11T22:51:00.8899213Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8899372Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8899731Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8899875Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8900247Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8900450Z output = model(*input) 2023-01-11T22:51:00.8900773Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8900911Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8901267Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8901432Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8901792Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8901905Z _lazy_init(state, module) 2023-01-11T22:51:00.8902255Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8902421Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8902812Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8902952Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8903331Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8903463Z return func(*args, **kwargs) 2023-01-11T22:51:00.8903841Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8903938Z p_assert( 2023-01-11T22:51:00.8904267Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8904392Z traceback.print_stack() 2023-01-11T22:51:00.8904514Z File "", line 1, in 2023-01-11T22:51:00.8904716Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8904839Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8905042Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8905190Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8905396Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8905496Z self.run() 2023-01-11T22:51:00.8905692Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8905833Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8906155Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8906283Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8906638Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8906758Z getattr(self, test_name)() 2023-01-11T22:51:00.8907111Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8907207Z fn() 2023-01-11T22:51:00.8907570Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8907691Z test(self, **param_kwargs) 2023-01-11T22:51:00.8908030Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8908150Z return func(*args, **kwargs) 2023-01-11T22:51:00.8908440Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.8908549Z self.run_subtests( 2023-01-11T22:51:00.8908893Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8909050Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8909409Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8909613Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8909974Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8910090Z output = model(*input) 2023-01-11T22:51:00.8910413Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8910545Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8910913Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8911082Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8911439Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8911560Z _lazy_init(state, module) 2023-01-11T22:51:00.8911904Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8912098Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8912500Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8912637Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8912969Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8913093Z return func(*args, **kwargs) 2023-01-11T22:51:00.8913464Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8913563Z p_assert( 2023-01-11T22:51:00.8913893Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8914005Z traceback.print_stack() 2023-01-11T22:51:00.8914130Z File "", line 1, in 2023-01-11T22:51:00.8914330Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8914469Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8914666Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8914814Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8915018Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8915105Z self.run() 2023-01-11T22:51:00.8915299Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8915440Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8915778Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8915905Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8916266Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8916384Z getattr(self, test_name)() 2023-01-11T22:51:00.8916735Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8916817Z fn() 2023-01-11T22:51:00.8917172Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8917295Z test(self, **param_kwargs) 2023-01-11T22:51:00.8917649Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8917773Z return func(*args, **kwargs) 2023-01-11T22:51:00.8918060Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.8918226Z self.run_subtests( 2023-01-11T22:51:00.8918571Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8918718Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8919076Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8919227Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8919593Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8919709Z output = model(*input) 2023-01-11T22:51:00.8920027Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8920157Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8920520Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8920679Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8921080Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8921203Z _lazy_init(state, module) 2023-01-11T22:51:00.8921552Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8921712Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8922102Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8922239Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8922572Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8922678Z return func(*args, **kwargs) 2023-01-11T22:51:00.8923054Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8923154Z p_assert( 2023-01-11T22:51:00.8923483Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8923606Z traceback.print_stack() 2023-01-11T22:51:00.8923734Z File "", line 1, in 2023-01-11T22:51:00.8923937Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8924075Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8924257Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8924404Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8924607Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8924708Z self.run() 2023-01-11T22:51:00.8924903Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8925050Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8925381Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8925496Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8925852Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8925972Z getattr(self, test_name)() 2023-01-11T22:51:00.8926320Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8926413Z fn() 2023-01-11T22:51:00.8926775Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8926893Z test(self, **param_kwargs) 2023-01-11T22:51:00.8927244Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8927413Z return func(*args, **kwargs) 2023-01-11T22:51:00.8927704Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.8927815Z self.run_subtests( 2023-01-11T22:51:00.8928161Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8928318Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8928674Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8928818Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8929185Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8929287Z output = model(*input) 2023-01-11T22:51:00.8929610Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8929740Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8930156Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8930329Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8930694Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8930808Z _lazy_init(state, module) 2023-01-11T22:51:00.8931154Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8931317Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8931692Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8931837Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8932172Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8932296Z return func(*args, **kwargs) 2023-01-11T22:51:00.8932667Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8932766Z p_assert( 2023-01-11T22:51:00.8933096Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8933216Z traceback.print_stack() 2023-01-11T22:51:00.8933327Z File "", line 1, in 2023-01-11T22:51:00.8933529Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8933667Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8933859Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8934009Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8934213Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8934317Z self.run() 2023-01-11T22:51:00.8934499Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8934639Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8934974Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8935099Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8935454Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8935570Z getattr(self, test_name)() 2023-01-11T22:51:00.8935919Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8936014Z fn() 2023-01-11T22:51:00.8936415Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8936708Z test(self, **param_kwargs) 2023-01-11T22:51:00.8937084Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8937200Z return func(*args, **kwargs) 2023-01-11T22:51:00.8937489Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.8937598Z self.run_subtests( 2023-01-11T22:51:00.8937945Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8938102Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8938441Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8938593Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8938963Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8939152Z output = model(*input) 2023-01-11T22:51:00.8939484Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8939617Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8939986Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8940156Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8940501Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8940613Z _lazy_init(state, module) 2023-01-11T22:51:00.8940959Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8941126Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8941520Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8941657Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8941990Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8942112Z return func(*args, **kwargs) 2023-01-11T22:51:00.8942464Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8942566Z p_assert( 2023-01-11T22:51:00.8942892Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8943013Z traceback.print_stack() 2023-01-11T22:51:00.8943756Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8944489Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.8944615Z File "", line 1, in 2023-01-11T22:51:00.8944819Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8944959Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8945158Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8945359Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8945565Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8945664Z self.run() 2023-01-11T22:51:00.8945863Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8946007Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8946345Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8946473Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8946814Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8946934Z getattr(self, test_name)() 2023-01-11T22:51:00.8947287Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8947384Z fn() 2023-01-11T22:51:00.8947737Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8947855Z test(self, **param_kwargs) 2023-01-11T22:51:00.8948263Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8948391Z return func(*args, **kwargs) 2023-01-11T22:51:00.8948666Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.8948784Z self.run_subtests( 2023-01-11T22:51:00.8949139Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8949295Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8949652Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8949799Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8950173Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8950286Z output = model(*input) 2023-01-11T22:51:00.8950593Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8950727Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8951096Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8951265Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8951622Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8951737Z _lazy_init(state, module) 2023-01-11T22:51:00.8952084Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8952247Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8952627Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8952765Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8953097Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8953218Z return func(*args, **kwargs) 2023-01-11T22:51:00.8953589Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8953686Z p_assert( 2023-01-11T22:51:00.8954015Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8954136Z traceback.print_stack() 2023-01-11T22:51:00.8954247Z File "", line 1, in 2023-01-11T22:51:00.8954507Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8954647Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8954846Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8954993Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8955199Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8955297Z self.run() 2023-01-11T22:51:00.8955491Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8955619Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8955954Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8956080Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8956430Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8956551Z getattr(self, test_name)() 2023-01-11T22:51:00.8956911Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8957050Z fn() 2023-01-11T22:51:00.8957418Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8957523Z test(self, **param_kwargs) 2023-01-11T22:51:00.8957871Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8957988Z return func(*args, **kwargs) 2023-01-11T22:51:00.8958278Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.8958386Z self.run_subtests( 2023-01-11T22:51:00.8958729Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8958891Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8959247Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8959381Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8959750Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8959869Z output = model(*input) 2023-01-11T22:51:00.8960185Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8960320Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8960689Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8960856Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8961218Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8961321Z _lazy_init(state, module) 2023-01-11T22:51:00.8961669Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8961833Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8962224Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8962358Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8962691Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8962813Z return func(*args, **kwargs) 2023-01-11T22:51:00.8963179Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8963317Z p_assert( 2023-01-11T22:51:00.8963651Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8963771Z traceback.print_stack() 2023-01-11T22:51:00.8963902Z File "", line 1, in 2023-01-11T22:51:00.8964107Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8964243Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8964441Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8964573Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8964780Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8964880Z self.run() 2023-01-11T22:51:00.8965074Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8965215Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8965555Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8965684Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8966084Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8966196Z getattr(self, test_name)() 2023-01-11T22:51:00.8966547Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8966641Z fn() 2023-01-11T22:51:00.8966998Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8967117Z test(self, **param_kwargs) 2023-01-11T22:51:00.8967462Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8967581Z return func(*args, **kwargs) 2023-01-11T22:51:00.8967871Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.8967972Z self.run_subtests( 2023-01-11T22:51:00.8968317Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8968472Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8968829Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8968976Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8969346Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8969461Z output = model(*input) 2023-01-11T22:51:00.8969777Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8969898Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8970269Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8970439Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8970803Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8970919Z _lazy_init(state, module) 2023-01-11T22:51:00.8971266Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8971429Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8971818Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8971956Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8972274Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8972450Z return func(*args, **kwargs) 2023-01-11T22:51:00.8972828Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8972927Z p_assert( 2023-01-11T22:51:00.8973253Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8973379Z traceback.print_stack() 2023-01-11T22:51:00.8973504Z File "", line 1, in 2023-01-11T22:51:00.8973694Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8973828Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8974024Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8974167Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8974371Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8974476Z self.run() 2023-01-11T22:51:00.8974674Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8974888Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8975220Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8975348Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8975701Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8975821Z getattr(self, test_name)() 2023-01-11T22:51:00.8976172Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8976265Z fn() 2023-01-11T22:51:00.8976888Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8977020Z test(self, **param_kwargs) 2023-01-11T22:51:00.8977366Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8977492Z return func(*args, **kwargs) 2023-01-11T22:51:00.8977783Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.8977894Z self.run_subtests( 2023-01-11T22:51:00.8978237Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8978396Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8978755Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8978902Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8979255Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8979374Z output = model(*input) 2023-01-11T22:51:00.8979691Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8979823Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8980193Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8980364Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8980727Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8980842Z _lazy_init(state, module) 2023-01-11T22:51:00.8981175Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8981336Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8981816Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8981953Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8982285Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8982409Z return func(*args, **kwargs) 2023-01-11T22:51:00.8982778Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8982876Z p_assert( 2023-01-11T22:51:00.8983190Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8983309Z traceback.print_stack() 2023-01-11T22:51:00.8983436Z File "", line 1, in 2023-01-11T22:51:00.8983639Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8983781Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8983978Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8984125Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8984375Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8984488Z self.run() 2023-01-11T22:51:00.8984685Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8984825Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8985162Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8985290Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8985641Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8985758Z getattr(self, test_name)() 2023-01-11T22:51:00.8986099Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8986192Z fn() 2023-01-11T22:51:00.8986552Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8986670Z test(self, **param_kwargs) 2023-01-11T22:51:00.8987020Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8987142Z return func(*args, **kwargs) 2023-01-11T22:51:00.8987432Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.8987540Z self.run_subtests( 2023-01-11T22:51:00.8987872Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8988026Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8988390Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8988540Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8988913Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8989030Z output = model(*input) 2023-01-11T22:51:00.8989348Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8989482Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8989836Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8990003Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.8990368Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.8990541Z _lazy_init(state, module) 2023-01-11T22:51:00.8990890Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.8991055Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.8991446Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.8991585Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.8991918Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.8992026Z return func(*args, **kwargs) 2023-01-11T22:51:00.8992395Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.8992495Z p_assert( 2023-01-11T22:51:00.8992877Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.8993003Z traceback.print_stack() 2023-01-11T22:51:00.8993130Z File "", line 1, in 2023-01-11T22:51:00.8993383Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.8993514Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.8993710Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.8993857Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.8994066Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.8994165Z self.run() 2023-01-11T22:51:00.8994364Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.8994508Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.8994845Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.8994963Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.8995316Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.8995438Z getattr(self, test_name)() 2023-01-11T22:51:00.8995787Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.8995879Z fn() 2023-01-11T22:51:00.8996233Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.8996351Z test(self, **param_kwargs) 2023-01-11T22:51:00.8996700Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.8996807Z return func(*args, **kwargs) 2023-01-11T22:51:00.8997094Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.8997207Z self.run_subtests( 2023-01-11T22:51:00.8997553Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.8997716Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.8998078Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.8998230Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.8998600Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.8998702Z output = model(*input) 2023-01-11T22:51:00.8999020Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.8999153Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.8999520Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.8999755Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9000120Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9000235Z _lazy_init(state, module) 2023-01-11T22:51:00.9000584Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9000734Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9001127Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9001264Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9001597Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9001717Z return func(*args, **kwargs) 2023-01-11T22:51:00.9002089Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9002187Z p_assert( 2023-01-11T22:51:00.9002560Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9002676Z traceback.print_stack() 2023-01-11T22:51:00.9002801Z File "", line 1, in 2023-01-11T22:51:00.9003001Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9003139Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9003336Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9003480Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9003685Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9003772Z self.run() 2023-01-11T22:51:00.9003974Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9004116Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9004457Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9004586Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9004940Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9005059Z getattr(self, test_name)() 2023-01-11T22:51:00.9005410Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9005490Z fn() 2023-01-11T22:51:00.9005850Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9005970Z test(self, **param_kwargs) 2023-01-11T22:51:00.9006317Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9006437Z return func(*args, **kwargs) 2023-01-11T22:51:00.9006728Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9006836Z self.run_subtests( 2023-01-11T22:51:00.9007182Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9007326Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9007683Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9007830Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9008199Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9008369Z output = model(*input) 2023-01-11T22:51:00.9008692Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9008829Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9009201Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9009356Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9009725Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9009839Z _lazy_init(state, module) 2023-01-11T22:51:00.9010184Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9010351Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9010743Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9010890Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9011279Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9011395Z return func(*args, **kwargs) 2023-01-11T22:51:00.9011766Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9011867Z p_assert( 2023-01-11T22:51:00.9012197Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9012319Z traceback.print_stack() 2023-01-11T22:51:00.9012446Z File "", line 1, in 2023-01-11T22:51:00.9012650Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9012786Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9012969Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9013122Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9013334Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9013434Z self.run() 2023-01-11T22:51:00.9013630Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9013769Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9014101Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9014227Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9014568Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9014690Z getattr(self, test_name)() 2023-01-11T22:51:00.9015040Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9015136Z fn() 2023-01-11T22:51:00.9015493Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9015611Z test(self, **param_kwargs) 2023-01-11T22:51:00.9015959Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9016081Z return func(*args, **kwargs) 2023-01-11T22:51:00.9016357Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9016467Z self.run_subtests( 2023-01-11T22:51:00.9017058Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9017213Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9017570Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9017801Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9018174Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9018288Z output = model(*input) 2023-01-11T22:51:00.9018595Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9018728Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9019099Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9019268Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9019627Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9019743Z _lazy_init(state, module) 2023-01-11T22:51:00.9020087Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9020253Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9020691Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9020838Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9021177Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9021299Z return func(*args, **kwargs) 2023-01-11T22:51:00.9021667Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9021765Z p_assert( 2023-01-11T22:51:00.9022094Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9022214Z traceback.print_stack() 2023-01-11T22:51:00.9022330Z File "", line 1, in 2023-01-11T22:51:00.9022535Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9022671Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9022865Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9023012Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9023218Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9023314Z self.run() 2023-01-11T22:51:00.9023496Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9023640Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9023979Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9024109Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9024464Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9024587Z getattr(self, test_name)() 2023-01-11T22:51:00.9024941Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9025034Z fn() 2023-01-11T22:51:00.9025376Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9025495Z test(self, **param_kwargs) 2023-01-11T22:51:00.9025844Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9025967Z return func(*args, **kwargs) 2023-01-11T22:51:00.9026253Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9026362Z self.run_subtests( 2023-01-11T22:51:00.9026764Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9026923Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9027266Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9027415Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9027782Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9027902Z output = model(*input) 2023-01-11T22:51:00.9028219Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9028355Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9028722Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9028891Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9029239Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9029399Z _lazy_init(state, module) 2023-01-11T22:51:00.9029753Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9029915Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9030306Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9030443Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9030777Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9030901Z return func(*args, **kwargs) 2023-01-11T22:51:00.9031257Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9031361Z p_assert( 2023-01-11T22:51:00.9031695Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9031818Z traceback.print_stack() 2023-01-11T22:51:00.9031945Z File "", line 1, in 2023-01-11T22:51:00.9032145Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9032284Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9032481Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9032613Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9032819Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9032916Z self.run() 2023-01-11T22:51:00.9033113Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9033258Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9033595Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9033727Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9034081Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9034186Z getattr(self, test_name)() 2023-01-11T22:51:00.9034538Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9034631Z fn() 2023-01-11T22:51:00.9034986Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9035101Z test(self, **param_kwargs) 2023-01-11T22:51:00.9035452Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9035629Z return func(*args, **kwargs) 2023-01-11T22:51:00.9035913Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9036013Z self.run_subtests( 2023-01-11T22:51:00.9036358Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9036512Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9036869Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9037018Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9037389Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9037505Z output = model(*input) 2023-01-11T22:51:00.9037828Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9037952Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9038372Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9038550Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9038911Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9039030Z _lazy_init(state, module) 2023-01-11T22:51:00.9039379Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9039544Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9039936Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9040059Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9040403Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9046796Z return func(*args, **kwargs) 2023-01-11T22:51:00.9047240Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9047346Z p_assert( 2023-01-11T22:51:00.9047689Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9047812Z traceback.print_stack() 2023-01-11T22:51:00.9047938Z File "", line 1, in 2023-01-11T22:51:00.9048145Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9048270Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9048468Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9048611Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9048824Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9048921Z self.run() 2023-01-11T22:51:00.9049124Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9049268Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9049608Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9049725Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9050081Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9050200Z getattr(self, test_name)() 2023-01-11T22:51:00.9050556Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9050647Z fn() 2023-01-11T22:51:00.9051002Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9051223Z test(self, **param_kwargs) 2023-01-11T22:51:00.9051582Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9051692Z return func(*args, **kwargs) 2023-01-11T22:51:00.9051980Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9052092Z self.run_subtests( 2023-01-11T22:51:00.9052440Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9052598Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9052956Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9053103Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9053478Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9053581Z output = model(*input) 2023-01-11T22:51:00.9053951Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9054092Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9054465Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9054636Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9054997Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9055113Z _lazy_init(state, module) 2023-01-11T22:51:00.9055460Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9055615Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9056010Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9056147Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9056478Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9056867Z return func(*args, **kwargs) 2023-01-11T22:51:00.9057261Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9057361Z p_assert( 2023-01-11T22:51:00.9057693Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9057803Z traceback.print_stack() 2023-01-11T22:51:00.9057927Z File "", line 1, in 2023-01-11T22:51:00.9058132Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9058272Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9058471Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9058618Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9058823Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9058909Z self.run() 2023-01-11T22:51:00.9059105Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9059243Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9059578Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9059707Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9060059Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9060279Z getattr(self, test_name)() 2023-01-11T22:51:00.9060633Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9060714Z fn() 2023-01-11T22:51:00.9061070Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9061188Z test(self, **param_kwargs) 2023-01-11T22:51:00.9061532Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9061649Z return func(*args, **kwargs) 2023-01-11T22:51:00.9061941Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9062053Z self.run_subtests( 2023-01-11T22:51:00.9062398Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9062543Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9062901Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9063105Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9063490Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9063602Z output = model(*input) 2023-01-11T22:51:00.9063921Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9064056Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9064424Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9064581Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9064943Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9065060Z _lazy_init(state, module) 2023-01-11T22:51:00.9065409Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9065573Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9065965Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9066097Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9066425Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9066533Z return func(*args, **kwargs) 2023-01-11T22:51:00.9066903Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9067001Z p_assert( 2023-01-11T22:51:00.9067337Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9067457Z traceback.print_stack() 2023-01-11T22:51:00.9068200Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9068934Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9069057Z File "", line 1, in 2023-01-11T22:51:00.9069257Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9069449Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9069633Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9069784Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9069991Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9070089Z self.run() 2023-01-11T22:51:00.9070286Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9070425Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9070764Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9070879Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9071233Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9071350Z getattr(self, test_name)() 2023-01-11T22:51:00.9071705Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9071797Z fn() 2023-01-11T22:51:00.9072201Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9072328Z test(self, **param_kwargs) 2023-01-11T22:51:00.9072680Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9072791Z return func(*args, **kwargs) 2023-01-11T22:51:00.9073082Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9073192Z self.run_subtests( 2023-01-11T22:51:00.9073536Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9073692Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9074058Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9074209Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9074580Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9074695Z output = model(*input) 2023-01-11T22:51:00.9075003Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9075136Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9075506Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9075674Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9076035Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9076155Z _lazy_init(state, module) 2023-01-11T22:51:00.9076503Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9076666Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9077042Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9077179Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9077511Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9077631Z return func(*args, **kwargs) 2023-01-11T22:51:00.9077999Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9078097Z p_assert( 2023-01-11T22:51:00.9078424Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9078604Z traceback.print_stack() 2023-01-11T22:51:00.9078715Z File "", line 1, in 2023-01-11T22:51:00.9078923Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9079054Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9079245Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9079385Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9079592Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9079691Z self.run() 2023-01-11T22:51:00.9079875Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9080013Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9080347Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9080477Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9080888Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9081012Z getattr(self, test_name)() 2023-01-11T22:51:00.9081367Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9081460Z fn() 2023-01-11T22:51:00.9081804Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9081919Z test(self, **param_kwargs) 2023-01-11T22:51:00.9082260Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9082378Z return func(*args, **kwargs) 2023-01-11T22:51:00.9082670Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9082785Z self.run_subtests( 2023-01-11T22:51:00.9083133Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9083289Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9083631Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9083779Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9084144Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9084261Z output = model(*input) 2023-01-11T22:51:00.9084577Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9084712Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9085080Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9085255Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9085603Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9085721Z _lazy_init(state, module) 2023-01-11T22:51:00.9086066Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9086225Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9086614Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9086752Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9087075Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9087247Z return func(*args, **kwargs) 2023-01-11T22:51:00.9087608Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9087707Z p_assert( 2023-01-11T22:51:00.9088036Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9088156Z traceback.print_stack() 2023-01-11T22:51:00.9088279Z File "", line 1, in 2023-01-11T22:51:00.9088478Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9088612Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9088804Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9088936Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9089139Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9089243Z self.run() 2023-01-11T22:51:00.9089440Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9089583Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9089967Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9090103Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9090447Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9090567Z getattr(self, test_name)() 2023-01-11T22:51:00.9090917Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9091011Z fn() 2023-01-11T22:51:00.9091366Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9091485Z test(self, **param_kwargs) 2023-01-11T22:51:00.9091838Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9091952Z return func(*args, **kwargs) 2023-01-11T22:51:00.9092230Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9092339Z self.run_subtests( 2023-01-11T22:51:00.9092739Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9092897Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9093255Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9093403Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9093772Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9093891Z output = model(*input) 2023-01-11T22:51:00.9094212Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9094336Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9094706Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9094877Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9095237Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9095350Z _lazy_init(state, module) 2023-01-11T22:51:00.9095700Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9095863Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9096256Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9096442Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9097062Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9097187Z return func(*args, **kwargs) 2023-01-11T22:51:00.9097567Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9097665Z p_assert( 2023-01-11T22:51:00.9097995Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9098114Z traceback.print_stack() 2023-01-11T22:51:00.9098237Z File "", line 1, in 2023-01-11T22:51:00.9098427Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9098565Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9098763Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9098907Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9099187Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9099297Z self.run() 2023-01-11T22:51:00.9099493Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9099620Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9099952Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9100081Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9100435Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9100556Z getattr(self, test_name)() 2023-01-11T22:51:00.9100908Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9101005Z fn() 2023-01-11T22:51:00.9101362Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9101470Z test(self, **param_kwargs) 2023-01-11T22:51:00.9101820Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9101938Z return func(*args, **kwargs) 2023-01-11T22:51:00.9102221Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9102331Z self.run_subtests( 2023-01-11T22:51:00.9102672Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9102828Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9103185Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9103323Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9103690Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9103801Z output = model(*input) 2023-01-11T22:51:00.9104120Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9104251Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9104618Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9104788Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9105148Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9105251Z _lazy_init(state, module) 2023-01-11T22:51:00.9105679Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9105846Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9106234Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9106376Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9106707Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9106824Z return func(*args, **kwargs) 2023-01-11T22:51:00.9107190Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9107277Z p_assert( 2023-01-11T22:51:00.9107607Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9107730Z traceback.print_stack() 2023-01-11T22:51:00.9107855Z File "", line 1, in 2023-01-11T22:51:00.9108057Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9108241Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9108446Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9108593Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9108785Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9108888Z self.run() 2023-01-11T22:51:00.9109085Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9109226Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9109563Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9109691Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9110050Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9110156Z getattr(self, test_name)() 2023-01-11T22:51:00.9110512Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9110611Z fn() 2023-01-11T22:51:00.9110971Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9111090Z test(self, **param_kwargs) 2023-01-11T22:51:00.9111437Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9111559Z return func(*args, **kwargs) 2023-01-11T22:51:00.9111848Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9111944Z self.run_subtests( 2023-01-11T22:51:00.9112294Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9112449Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9112805Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9112951Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9113318Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9113429Z output = model(*input) 2023-01-11T22:51:00.9113745Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9113867Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9114237Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9114459Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9114822Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9114939Z _lazy_init(state, module) 2023-01-11T22:51:00.9115284Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9115443Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9115829Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9115966Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9116285Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9116407Z return func(*args, **kwargs) 2023-01-11T22:51:00.9116775Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9116874Z p_assert( 2023-01-11T22:51:00.9117249Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9117378Z traceback.print_stack() 2023-01-11T22:51:00.9117498Z File "", line 1, in 2023-01-11T22:51:00.9117689Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9117828Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9118023Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9118170Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9118379Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9118479Z self.run() 2023-01-11T22:51:00.9118674Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9118818Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9119141Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9119273Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9119627Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9119747Z getattr(self, test_name)() 2023-01-11T22:51:00.9120097Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9120192Z fn() 2023-01-11T22:51:00.9120546Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9120662Z test(self, **param_kwargs) 2023-01-11T22:51:00.9120998Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9121122Z return func(*args, **kwargs) 2023-01-11T22:51:00.9121412Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9121521Z self.run_subtests( 2023-01-11T22:51:00.9121866Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9122025Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9122381Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9122527Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9122881Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9122998Z output = model(*input) 2023-01-11T22:51:00.9123320Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9123540Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9123914Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9124080Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9124437Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9124549Z _lazy_init(state, module) 2023-01-11T22:51:00.9124880Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9125040Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9125430Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9125570Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9125900Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9126064Z return func(*args, **kwargs) 2023-01-11T22:51:00.9126444Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9126547Z p_assert( 2023-01-11T22:51:00.9126862Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9126985Z traceback.print_stack() 2023-01-11T22:51:00.9127725Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9128466Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9129198Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9129933Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9130660Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9131379Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9132105Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9132824Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9133597Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9134309Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9135062Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9135791Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9136507Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9137431Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9138154Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9138863Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9139577Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9140291Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9141001Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9141713Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9142519Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9143236Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9144021Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9144752Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9145470Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9146185Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9146903Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9147613Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9148325Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9149039Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9149748Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9150466Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9151238Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9151948Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9152697Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9153420Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9154133Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9154842Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9155560Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9156273Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9156985Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9157695Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9158407Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9159123Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9159888Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9160597Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9161308Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9162065Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9162789Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9163495Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9164214Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9164929Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9165640Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9166355Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9167062Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9167771Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9168535Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9169243Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9169953Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9170707Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9171425Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9172131Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9172839Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9173546Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9174260Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9174972Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9175682Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9176393Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9177382Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9178098Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9178810Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9179615Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9180345Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9181116Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9181840Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9182553Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9183267Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9183979Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9184676Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9185386Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9186182Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9186892Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9187600Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9188361Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9189084Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9189796Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9190513Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9191218Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9191929Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9192695Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9193416Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9194126Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9194899Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9195612Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9196323Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9197075Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9197792Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9198501Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9199220Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9199929Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9200640Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9201356Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9202065Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9202775Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9203542Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9204253Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9204960Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9205727Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9206450Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9207161Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9207875Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9208590Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9208700Z dist init r=1, world=2 2023-01-11T22:51:00.9209021Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9209337Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9209639Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9209942Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9210241Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9210535Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9210866Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9211163Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9211456Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9211751Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9212044Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9212339Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9212451Z dist init r=0, world=2 2023-01-11T22:51:00.9212808Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9213117Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9213417Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9213712Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9214008Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9214296Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9214589Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9214881Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9215171Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9215462Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9215757Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9216051Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9216147Z ok (5.512s) 2023-01-11T22:51:00.9216511Z test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95688 2023-01-11T22:51:00.9216906Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95689 2023-01-11T22:51:00.9217276Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.9217451Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.9217923Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.9218116Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.9218476Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.9218644Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.9219016Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.9219199Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.9219438Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.9219663Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.9220056Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.9220501Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.9220736Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.9220963Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.9221967Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.9222078Z warnings.warn( 2023-01-11T22:51:00.9223081Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.9223188Z warnings.warn( 2023-01-11T22:51:00.9223312Z File "", line 1, in 2023-01-11T22:51:00.9223521Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9223646Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9223846Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9223991Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9224201Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9224304Z self.run() 2023-01-11T22:51:00.9224502Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9224645Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9224970Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9225100Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9225454Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9225574Z getattr(self, test_name)() 2023-01-11T22:51:00.9225929Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9226024Z fn() 2023-01-11T22:51:00.9226381Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9226559Z test(self, **param_kwargs) 2023-01-11T22:51:00.9226904Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9227028Z return func(*args, **kwargs) 2023-01-11T22:51:00.9227325Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9227437Z self.run_subtests( 2023-01-11T22:51:00.9227783Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9227940Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9228301Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9228450Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9228821Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9228924Z output = model(*input) 2023-01-11T22:51:00.9229286Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9229426Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9229796Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9229965Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9230324Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9230440Z _lazy_init(state, module) 2023-01-11T22:51:00.9230787Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9230940Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9231332Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9231471Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9231802Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9231922Z return func(*args, **kwargs) 2023-01-11T22:51:00.9232288Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9232386Z p_assert( 2023-01-11T22:51:00.9232716Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9232826Z traceback.print_stack() 2023-01-11T22:51:00.9232947Z File "", line 1, in 2023-01-11T22:51:00.9233147Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9233286Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9233485Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9233636Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9233844Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9233931Z self.run() 2023-01-11T22:51:00.9234129Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9234269Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9234605Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9234734Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9235090Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9235209Z getattr(self, test_name)() 2023-01-11T22:51:00.9235620Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9235700Z fn() 2023-01-11T22:51:00.9236061Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9236180Z test(self, **param_kwargs) 2023-01-11T22:51:00.9236530Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9236649Z return func(*args, **kwargs) 2023-01-11T22:51:00.9236939Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9237050Z self.run_subtests( 2023-01-11T22:51:00.9237396Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9237544Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9237903Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9238096Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9238475Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9238593Z output = model(*input) 2023-01-11T22:51:00.9238912Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9239050Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9239417Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9239572Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9239924Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9240039Z _lazy_init(state, module) 2023-01-11T22:51:00.9240390Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9240554Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9240946Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9241082Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9241413Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9241521Z return func(*args, **kwargs) 2023-01-11T22:51:00.9241891Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9241990Z p_assert( 2023-01-11T22:51:00.9242320Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9242440Z traceback.print_stack() 2023-01-11T22:51:00.9242565Z File "", line 1, in 2023-01-11T22:51:00.9242770Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9242909Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9243091Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9243235Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9243438Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9243538Z self.run() 2023-01-11T22:51:00.9243736Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9243876Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9244210Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9244376Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9244735Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9244850Z getattr(self, test_name)() 2023-01-11T22:51:00.9245202Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9245293Z fn() 2023-01-11T22:51:00.9245646Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9245761Z test(self, **param_kwargs) 2023-01-11T22:51:00.9246110Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9246218Z return func(*args, **kwargs) 2023-01-11T22:51:00.9246506Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9246618Z self.run_subtests( 2023-01-11T22:51:00.9247011Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9247175Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9247532Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9247684Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9248048Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9248164Z output = model(*input) 2023-01-11T22:51:00.9248471Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9248600Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9248969Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9249135Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9249496Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9249611Z _lazy_init(state, module) 2023-01-11T22:51:00.9249958Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9250120Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9250497Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9250635Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9250967Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9251090Z return func(*args, **kwargs) 2023-01-11T22:51:00.9251464Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9251561Z p_assert( 2023-01-11T22:51:00.9251892Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9252011Z traceback.print_stack() 2023-01-11T22:51:00.9252123Z File "", line 1, in 2023-01-11T22:51:00.9252324Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9252462Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9252656Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9252799Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9253001Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9253152Z self.run() 2023-01-11T22:51:00.9253335Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9253474Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9253811Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9253939Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9254294Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9254413Z getattr(self, test_name)() 2023-01-11T22:51:00.9254765Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9254862Z fn() 2023-01-11T22:51:00.9255207Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9255327Z test(self, **param_kwargs) 2023-01-11T22:51:00.9255680Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9255800Z return func(*args, **kwargs) 2023-01-11T22:51:00.9256164Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9256283Z self.run_subtests( 2023-01-11T22:51:00.9256807Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9256972Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9257325Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9257473Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9257843Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9257966Z output = model(*input) 2023-01-11T22:51:00.9258285Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9258423Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9258794Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9258964Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9259308Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9259429Z _lazy_init(state, module) 2023-01-11T22:51:00.9259772Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9259935Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9260326Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9260467Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9260805Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9260925Z return func(*args, **kwargs) 2023-01-11T22:51:00.9261279Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9261376Z p_assert( 2023-01-11T22:51:00.9261701Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9261821Z traceback.print_stack() 2023-01-11T22:51:00.9261943Z File "", line 1, in 2023-01-11T22:51:00.9262145Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9262278Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9262560Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9262692Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9262904Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9263006Z self.run() 2023-01-11T22:51:00.9263205Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9263348Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9263690Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9263820Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9264159Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9264280Z getattr(self, test_name)() 2023-01-11T22:51:00.9264631Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9264730Z fn() 2023-01-11T22:51:00.9265158Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9265287Z test(self, **param_kwargs) 2023-01-11T22:51:00.9265644Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9265763Z return func(*args, **kwargs) 2023-01-11T22:51:00.9266039Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9266149Z self.run_subtests( 2023-01-11T22:51:00.9266496Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9266653Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9267012Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9267161Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9267533Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9267649Z output = model(*input) 2023-01-11T22:51:00.9267955Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9268087Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9268456Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9268624Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9268985Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9269098Z _lazy_init(state, module) 2023-01-11T22:51:00.9269451Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9269613Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9270002Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9270127Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9270456Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9270572Z return func(*args, **kwargs) 2023-01-11T22:51:00.9270939Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9271038Z p_assert( 2023-01-11T22:51:00.9271369Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9271551Z traceback.print_stack() 2023-01-11T22:51:00.9271662Z File "", line 1, in 2023-01-11T22:51:00.9271869Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9272011Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9272209Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9272358Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9272564Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9272665Z self.run() 2023-01-11T22:51:00.9272858Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9272985Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9273319Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9273449Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9273808Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9273928Z getattr(self, test_name)() 2023-01-11T22:51:00.9274328Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9274426Z fn() 2023-01-11T22:51:00.9274784Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9274889Z test(self, **param_kwargs) 2023-01-11T22:51:00.9275239Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9275357Z return func(*args, **kwargs) 2023-01-11T22:51:00.9275645Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9275760Z self.run_subtests( 2023-01-11T22:51:00.9276105Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9276267Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9276624Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9276758Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9277124Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9277239Z output = model(*input) 2023-01-11T22:51:00.9277558Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9277689Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9278057Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9278228Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9278593Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9278696Z _lazy_init(state, module) 2023-01-11T22:51:00.9279041Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9279205Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9279592Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9279732Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9280068Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9280188Z return func(*args, **kwargs) 2023-01-11T22:51:00.9280615Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9280701Z p_assert( 2023-01-11T22:51:00.9281038Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9281160Z traceback.print_stack() 2023-01-11T22:51:00.9281899Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9282630Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9282760Z File "", line 1, in 2023-01-11T22:51:00.9282968Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9283152Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9283356Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9283501Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9283695Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9283795Z self.run() 2023-01-11T22:51:00.9283994Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9284137Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9284473Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9284600Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9284957Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9285063Z getattr(self, test_name)() 2023-01-11T22:51:00.9285416Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9285507Z fn() 2023-01-11T22:51:00.9285864Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9285983Z test(self, **param_kwargs) 2023-01-11T22:51:00.9286329Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9286450Z return func(*args, **kwargs) 2023-01-11T22:51:00.9286739Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9286835Z self.run_subtests( 2023-01-11T22:51:00.9287185Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9287343Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9287706Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9287855Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9288226Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9288340Z output = model(*input) 2023-01-11T22:51:00.9288657Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9288777Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9289143Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9289366Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9289727Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9289846Z _lazy_init(state, module) 2023-01-11T22:51:00.9290193Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9290358Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9290743Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9290879Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9291198Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9291323Z return func(*args, **kwargs) 2023-01-11T22:51:00.9291693Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9291797Z p_assert( 2023-01-11T22:51:00.9292172Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9292300Z traceback.print_stack() 2023-01-11T22:51:00.9292421Z File "", line 1, in 2023-01-11T22:51:00.9292612Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9292796Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9292992Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9293134Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9293337Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9293435Z self.run() 2023-01-11T22:51:00.9293628Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9293771Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9294101Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9294227Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9294580Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9294699Z getattr(self, test_name)() 2023-01-11T22:51:00.9295048Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9295144Z fn() 2023-01-11T22:51:00.9295499Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9295620Z test(self, **param_kwargs) 2023-01-11T22:51:00.9295956Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9296081Z return func(*args, **kwargs) 2023-01-11T22:51:00.9296368Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9296482Z self.run_subtests( 2023-01-11T22:51:00.9297034Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9297193Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9297553Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9297699Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9298054Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9298168Z output = model(*input) 2023-01-11T22:51:00.9298490Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9298711Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9299091Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9299261Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9299621Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9299739Z _lazy_init(state, module) 2023-01-11T22:51:00.9300071Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9300233Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9300621Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9300762Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9301092Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9301212Z return func(*args, **kwargs) 2023-01-11T22:51:00.9301639Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9301748Z p_assert( 2023-01-11T22:51:00.9302065Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9302188Z traceback.print_stack() 2023-01-11T22:51:00.9302312Z File "", line 1, in 2023-01-11T22:51:00.9302513Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9302647Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9302842Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9302988Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9303194Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9303281Z self.run() 2023-01-11T22:51:00.9303480Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9303621Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9303952Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9304077Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9304427Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9304546Z getattr(self, test_name)() 2023-01-11T22:51:00.9304883Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9304980Z fn() 2023-01-11T22:51:00.9305336Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9305458Z test(self, **param_kwargs) 2023-01-11T22:51:00.9305808Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9305930Z return func(*args, **kwargs) 2023-01-11T22:51:00.9306219Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9306328Z self.run_subtests( 2023-01-11T22:51:00.9306658Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9306814Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9307172Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9307319Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9307749Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9307867Z output = model(*input) 2023-01-11T22:51:00.9308190Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9308324Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9308677Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9308845Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9309205Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9309321Z _lazy_init(state, module) 2023-01-11T22:51:00.9309666Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9309833Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9310267Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9310415Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9310750Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9310858Z return func(*args, **kwargs) 2023-01-11T22:51:00.9311224Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9311322Z p_assert( 2023-01-11T22:51:00.9311654Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9311775Z traceback.print_stack() 2023-01-11T22:51:00.9311899Z File "", line 1, in 2023-01-11T22:51:00.9312109Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9312233Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9312428Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9312575Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9312780Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9312879Z self.run() 2023-01-11T22:51:00.9313076Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9313215Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9313549Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9313664Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9314019Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9314143Z getattr(self, test_name)() 2023-01-11T22:51:00.9314496Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9314593Z fn() 2023-01-11T22:51:00.9314948Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9315065Z test(self, **param_kwargs) 2023-01-11T22:51:00.9315412Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9315519Z return func(*args, **kwargs) 2023-01-11T22:51:00.9315807Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9315919Z self.run_subtests( 2023-01-11T22:51:00.9316266Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9316477Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9316836Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9316984Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9317347Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9317449Z output = model(*input) 2023-01-11T22:51:00.9317770Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9317907Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9318271Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9318440Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9318801Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9318920Z _lazy_init(state, module) 2023-01-11T22:51:00.9319314Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9319472Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9319864Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9320002Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9320338Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9320462Z return func(*args, **kwargs) 2023-01-11T22:51:00.9320832Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9320935Z p_assert( 2023-01-11T22:51:00.9321264Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9321372Z traceback.print_stack() 2023-01-11T22:51:00.9321501Z File "", line 1, in 2023-01-11T22:51:00.9321707Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9321844Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9322041Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9322191Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9322399Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9322486Z self.run() 2023-01-11T22:51:00.9322683Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9322825Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9323159Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9323294Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9323653Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9323773Z getattr(self, test_name)() 2023-01-11T22:51:00.9324126Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9324206Z fn() 2023-01-11T22:51:00.9324564Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9324681Z test(self, **param_kwargs) 2023-01-11T22:51:00.9325036Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9325156Z return func(*args, **kwargs) 2023-01-11T22:51:00.9325440Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9325608Z self.run_subtests( 2023-01-11T22:51:00.9325956Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9326099Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9326460Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9326607Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9326981Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9327100Z output = model(*input) 2023-01-11T22:51:00.9327422Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9327555Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9327930Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9328145Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9328520Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9328636Z _lazy_init(state, module) 2023-01-11T22:51:00.9328985Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9329149Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9329542Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9329680Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9330011Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9330134Z return func(*args, **kwargs) 2023-01-11T22:51:00.9330494Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9330594Z p_assert( 2023-01-11T22:51:00.9330928Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9331051Z traceback.print_stack() 2023-01-11T22:51:00.9331178Z File "", line 1, in 2023-01-11T22:51:00.9331380Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9331516Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9331697Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9331841Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9332043Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9332144Z self.run() 2023-01-11T22:51:00.9332337Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9332479Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9332810Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9332938Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9333279Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9333395Z getattr(self, test_name)() 2023-01-11T22:51:00.9333743Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9333837Z fn() 2023-01-11T22:51:00.9334193Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9334368Z test(self, **param_kwargs) 2023-01-11T22:51:00.9334717Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9334840Z return func(*args, **kwargs) 2023-01-11T22:51:00.9335115Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9335222Z self.run_subtests( 2023-01-11T22:51:00.9335569Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9335727Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9336085Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9336232Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9336847Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9336981Z output = model(*input) 2023-01-11T22:51:00.9337370Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9337515Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9337890Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9338061Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9338419Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9338538Z _lazy_init(state, module) 2023-01-11T22:51:00.9338883Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9339047Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9339426Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9339565Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9339899Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9340016Z return func(*args, **kwargs) 2023-01-11T22:51:00.9340388Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9340485Z p_assert( 2023-01-11T22:51:00.9340815Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9340935Z traceback.print_stack() 2023-01-11T22:51:00.9341047Z File "", line 1, in 2023-01-11T22:51:00.9341250Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9341390Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9341581Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9341725Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9341932Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9342030Z self.run() 2023-01-11T22:51:00.9342212Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9342352Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9342688Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9342814Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9343164Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9343285Z getattr(self, test_name)() 2023-01-11T22:51:00.9343634Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9343805Z fn() 2023-01-11T22:51:00.9344160Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9344277Z test(self, **param_kwargs) 2023-01-11T22:51:00.9344623Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9344746Z return func(*args, **kwargs) 2023-01-11T22:51:00.9345033Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9345144Z self.run_subtests( 2023-01-11T22:51:00.9345486Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9345640Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9345984Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9346133Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9346544Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9346666Z output = model(*input) 2023-01-11T22:51:00.9346986Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9347120Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9347491Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9347660Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9348003Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9348126Z _lazy_init(state, module) 2023-01-11T22:51:00.9348468Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9348632Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9349027Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9349163Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9349493Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9349616Z return func(*args, **kwargs) 2023-01-11T22:51:00.9349983Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9350068Z p_assert( 2023-01-11T22:51:00.9350396Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9350523Z traceback.print_stack() 2023-01-11T22:51:00.9350648Z File "", line 1, in 2023-01-11T22:51:00.9350853Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9350989Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9351181Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9351312Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9351515Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9351611Z self.run() 2023-01-11T22:51:00.9351807Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9351949Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9352286Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9352481Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9352838Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9352948Z getattr(self, test_name)() 2023-01-11T22:51:00.9353301Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9353394Z fn() 2023-01-11T22:51:00.9353750Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9353868Z test(self, **param_kwargs) 2023-01-11T22:51:00.9354215Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9354333Z return func(*args, **kwargs) 2023-01-11T22:51:00.9354621Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9354721Z self.run_subtests( 2023-01-11T22:51:00.9355069Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9355271Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9355640Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9355792Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9356160Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9356274Z output = model(*input) 2023-01-11T22:51:00.9356596Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9356715Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9357079Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9357253Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9357614Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9357735Z _lazy_init(state, module) 2023-01-11T22:51:00.9358083Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9358247Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9358638Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9358761Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9359092Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9359213Z return func(*args, **kwargs) 2023-01-11T22:51:00.9359590Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9359689Z p_assert( 2023-01-11T22:51:00.9360020Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9360138Z traceback.print_stack() 2023-01-11T22:51:00.9360264Z File "", line 1, in 2023-01-11T22:51:00.9360451Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9360589Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9360786Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9360928Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9361131Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9361229Z self.run() 2023-01-11T22:51:00.9361480Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9361608Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9361951Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9362080Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9362433Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9362554Z getattr(self, test_name)() 2023-01-11T22:51:00.9362907Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9363003Z fn() 2023-01-11T22:51:00.9363361Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9363466Z test(self, **param_kwargs) 2023-01-11T22:51:00.9363817Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9363941Z return func(*args, **kwargs) 2023-01-11T22:51:00.9364272Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9364389Z self.run_subtests( 2023-01-11T22:51:00.9364738Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9364894Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9365250Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9365384Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9365753Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9365866Z output = model(*input) 2023-01-11T22:51:00.9366193Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9366328Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9366697Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9366864Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9367221Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9367324Z _lazy_init(state, module) 2023-01-11T22:51:00.9367665Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9367827Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9368215Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9368356Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9368691Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9368810Z return func(*args, **kwargs) 2023-01-11T22:51:00.9369179Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9369279Z p_assert( 2023-01-11T22:51:00.9369597Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9369719Z traceback.print_stack() 2023-01-11T22:51:00.9369841Z File "", line 1, in 2023-01-11T22:51:00.9370042Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9370181Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9370374Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9370575Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9370766Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9370869Z self.run() 2023-01-11T22:51:00.9371065Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9371206Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9371545Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9371674Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9372030Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9372148Z getattr(self, test_name)() 2023-01-11T22:51:00.9372484Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9372582Z fn() 2023-01-11T22:51:00.9372939Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9373057Z test(self, **param_kwargs) 2023-01-11T22:51:00.9373448Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9373575Z return func(*args, **kwargs) 2023-01-11T22:51:00.9373863Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9373975Z self.run_subtests( 2023-01-11T22:51:00.9374311Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9374465Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9374816Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9374967Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9375338Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9375453Z output = model(*input) 2023-01-11T22:51:00.9375775Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9375909Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9376262Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9376430Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9376981Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9377098Z _lazy_init(state, module) 2023-01-11T22:51:00.9377445Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9377615Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9378012Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9378149Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9378465Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9378586Z return func(*args, **kwargs) 2023-01-11T22:51:00.9378955Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9379055Z p_assert( 2023-01-11T22:51:00.9379385Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9379507Z traceback.print_stack() 2023-01-11T22:51:00.9379718Z File "", line 1, in 2023-01-11T22:51:00.9379922Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9380045Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9380246Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9380395Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9380601Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9380700Z self.run() 2023-01-11T22:51:00.9380897Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9381033Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9381357Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9381488Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9381842Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9381966Z getattr(self, test_name)() 2023-01-11T22:51:00.9382383Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9382483Z fn() 2023-01-11T22:51:00.9382848Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9382966Z test(self, **param_kwargs) 2023-01-11T22:51:00.9383300Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9383426Z return func(*args, **kwargs) 2023-01-11T22:51:00.9383708Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9383817Z self.run_subtests( 2023-01-11T22:51:00.9384162Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9384322Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9384684Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9384828Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9385180Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9385292Z output = model(*input) 2023-01-11T22:51:00.9385609Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9385743Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9386110Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9386282Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9386646Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9386765Z _lazy_init(state, module) 2023-01-11T22:51:00.9387097Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9387260Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9387654Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9387790Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9388121Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9388243Z return func(*args, **kwargs) 2023-01-11T22:51:00.9388612Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9388765Z p_assert( 2023-01-11T22:51:00.9389101Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9389213Z traceback.print_stack() 2023-01-11T22:51:00.9389335Z File "", line 1, in 2023-01-11T22:51:00.9389538Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9389679Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9389870Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9390015Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9390220Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9390306Z self.run() 2023-01-11T22:51:00.9390501Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9390645Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9390980Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9391164Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9391529Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9391648Z getattr(self, test_name)() 2023-01-11T22:51:00.9392001Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9392081Z fn() 2023-01-11T22:51:00.9392440Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9392561Z test(self, **param_kwargs) 2023-01-11T22:51:00.9392962Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9393090Z return func(*args, **kwargs) 2023-01-11T22:51:00.9393384Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9393494Z self.run_subtests( 2023-01-11T22:51:00.9393835Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9393980Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9394335Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9394481Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9394851Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9394966Z output = model(*input) 2023-01-11T22:51:00.9395286Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9395423Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9395789Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9395945Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9396302Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9396415Z _lazy_init(state, module) 2023-01-11T22:51:00.9396761Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9396921Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9397311Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9397448Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9397876Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9397983Z return func(*args, **kwargs) 2023-01-11T22:51:00.9398353Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9398451Z p_assert( 2023-01-11T22:51:00.9398777Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9398899Z traceback.print_stack() 2023-01-11T22:51:00.9399638Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9400411Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9400548Z File "", line 1, in 2023-01-11T22:51:00.9400756Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9400891Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9401077Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9401223Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9401433Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9401536Z self.run() 2023-01-11T22:51:00.9401733Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9401874Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9402214Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9402329Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9402688Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9402809Z getattr(self, test_name)() 2023-01-11T22:51:00.9403160Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9403254Z fn() 2023-01-11T22:51:00.9403610Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9403730Z test(self, **param_kwargs) 2023-01-11T22:51:00.9404077Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9404184Z return func(*args, **kwargs) 2023-01-11T22:51:00.9404473Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9404583Z self.run_subtests( 2023-01-11T22:51:00.9404928Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9405088Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9405447Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9405592Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9405961Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9406063Z output = model(*input) 2023-01-11T22:51:00.9406384Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9406576Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9406947Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9407120Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9407482Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9407596Z _lazy_init(state, module) 2023-01-11T22:51:00.9407943Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9408094Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9408487Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9408625Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9408957Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9409084Z return func(*args, **kwargs) 2023-01-11T22:51:00.9409499Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9409606Z p_assert( 2023-01-11T22:51:00.9409939Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9410049Z traceback.print_stack() 2023-01-11T22:51:00.9410174Z File "", line 1, in 2023-01-11T22:51:00.9410379Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9410518Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9410717Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9410863Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9411072Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9411179Z self.run() 2023-01-11T22:51:00.9411362Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9411505Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9411838Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9411970Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9412321Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9412439Z getattr(self, test_name)() 2023-01-11T22:51:00.9412791Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9412885Z fn() 2023-01-11T22:51:00.9413227Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9413348Z test(self, **param_kwargs) 2023-01-11T22:51:00.9413696Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9413821Z return func(*args, **kwargs) 2023-01-11T22:51:00.9414110Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9414226Z self.run_subtests( 2023-01-11T22:51:00.9414565Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9414720Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9415062Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9415209Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9415578Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9415752Z output = model(*input) 2023-01-11T22:51:00.9416073Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9416207Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9416824Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9417009Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9417365Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9417479Z _lazy_init(state, module) 2023-01-11T22:51:00.9417821Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9417985Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9418380Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9418592Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9418943Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9419062Z return func(*args, **kwargs) 2023-01-11T22:51:00.9419420Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9419516Z p_assert( 2023-01-11T22:51:00.9419845Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9419967Z traceback.print_stack() 2023-01-11T22:51:00.9420092Z File "", line 1, in 2023-01-11T22:51:00.9420292Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9420434Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9420616Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9420768Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9420974Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9421073Z self.run() 2023-01-11T22:51:00.9421268Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9421408Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9421746Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9421874Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9422215Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9422338Z getattr(self, test_name)() 2023-01-11T22:51:00.9422693Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9422787Z fn() 2023-01-11T22:51:00.9423144Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9423264Z test(self, **param_kwargs) 2023-01-11T22:51:00.9423609Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9423730Z return func(*args, **kwargs) 2023-01-11T22:51:00.9424005Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9424111Z self.run_subtests( 2023-01-11T22:51:00.9424453Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9424608Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9425039Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9425193Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9425564Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9425681Z output = model(*input) 2023-01-11T22:51:00.9425986Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9426121Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9426489Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9426654Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9427015Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9427138Z _lazy_init(state, module) 2023-01-11T22:51:00.9427532Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9427703Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9428083Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9428217Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9428549Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9428671Z return func(*args, **kwargs) 2023-01-11T22:51:00.9429041Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9429138Z p_assert( 2023-01-11T22:51:00.9429464Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9429589Z traceback.print_stack() 2023-01-11T22:51:00.9429699Z File "", line 1, in 2023-01-11T22:51:00.9429906Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9430045Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9430243Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9430391Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9430596Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9430698Z self.run() 2023-01-11T22:51:00.9430895Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9431022Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9431355Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9431486Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9431839Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9431962Z getattr(self, test_name)() 2023-01-11T22:51:00.9432315Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9432410Z fn() 2023-01-11T22:51:00.9432753Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9432875Z test(self, **param_kwargs) 2023-01-11T22:51:00.9433222Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9433342Z return func(*args, **kwargs) 2023-01-11T22:51:00.9433633Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9433800Z self.run_subtests( 2023-01-11T22:51:00.9434151Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9434309Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9434668Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9434802Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9435168Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9435281Z output = model(*input) 2023-01-11T22:51:00.9435599Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9435732Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9436101Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9436275Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9436686Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9436794Z _lazy_init(state, module) 2023-01-11T22:51:00.9437143Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9437308Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9437700Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9437838Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9438175Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9438304Z return func(*args, **kwargs) 2023-01-11T22:51:00.9438678Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9438766Z p_assert( 2023-01-11T22:51:00.9439095Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9439215Z traceback.print_stack() 2023-01-11T22:51:00.9439341Z File "", line 1, in 2023-01-11T22:51:00.9439542Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9439677Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9439874Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9440008Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9440217Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9440322Z self.run() 2023-01-11T22:51:00.9440516Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9440656Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9440988Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9441114Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9441466Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9441571Z getattr(self, test_name)() 2023-01-11T22:51:00.9441922Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9442014Z fn() 2023-01-11T22:51:00.9442371Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9442492Z test(self, **param_kwargs) 2023-01-11T22:51:00.9442897Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9443019Z return func(*args, **kwargs) 2023-01-11T22:51:00.9443310Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9443408Z self.run_subtests( 2023-01-11T22:51:00.9443752Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9443906Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9444261Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9444408Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9444775Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9444890Z output = model(*input) 2023-01-11T22:51:00.9445209Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9445372Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9445752Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9445920Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9446280Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9446397Z _lazy_init(state, module) 2023-01-11T22:51:00.9446744Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9446908Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9447298Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9447427Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9447764Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9447886Z return func(*args, **kwargs) 2023-01-11T22:51:00.9448257Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9448355Z p_assert( 2023-01-11T22:51:00.9448684Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9448806Z traceback.print_stack() 2023-01-11T22:51:00.9448934Z File "", line 1, in 2023-01-11T22:51:00.9449122Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9449256Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9449458Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9449605Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9449816Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9449919Z self.run() 2023-01-11T22:51:00.9450113Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9450251Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9450570Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9450700Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9451052Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9451172Z getattr(self, test_name)() 2023-01-11T22:51:00.9451523Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9451673Z fn() 2023-01-11T22:51:00.9452029Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9452138Z test(self, **param_kwargs) 2023-01-11T22:51:00.9452484Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9452601Z return func(*args, **kwargs) 2023-01-11T22:51:00.9452888Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9452998Z self.run_subtests( 2023-01-11T22:51:00.9453342Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9453494Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9453853Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9454010Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9454421Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9454546Z output = model(*input) 2023-01-11T22:51:00.9454870Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9455003Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9455371Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9455542Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9455899Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9456015Z _lazy_init(state, module) 2023-01-11T22:51:00.9456352Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9456514Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9457104Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9457246Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9457582Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9457703Z return func(*args, **kwargs) 2023-01-11T22:51:00.9458072Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9458173Z p_assert( 2023-01-11T22:51:00.9458485Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9458609Z traceback.print_stack() 2023-01-11T22:51:00.9459355Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9460224Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9461498Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9462958Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9464222Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9465458Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9466802Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9468061Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9470011Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:237: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:00.9470271Z (rank, world_num_valid_indices[rank]) 2023-01-11T22:51:00.9472148Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:259: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:00.9472328Z world_indices[ 2023-01-11T22:51:00.9474082Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:259: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:00.9474262Z world_indices[ 2023-01-11T22:51:00.9474432Z dist init r=0, world=2 2023-01-11T22:51:00.9474940Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9475451Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9475953Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9476424Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9477152Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9477767Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9478375Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9478980Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9479585Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9480183Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9480936Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9481508Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9481699Z dist init r=1, world=2 2023-01-11T22:51:00.9482235Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9482804Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9483408Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9484022Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9484640Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9485261Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9485881Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9486502Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9487130Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9487761Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9488383Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9489000Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9489199Z ok (5.813s) 2023-01-11T22:51:00.9489937Z test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95771 2023-01-11T22:51:00.9490494Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95772 2023-01-11T22:51:00.9491300Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.9491630Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.9492403Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.9492864Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.9493630Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.9493983Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.9494734Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.9495138Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.9495725Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.9496208Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.9497342Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.9498158Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.9498634Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.9499087Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.9501189Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.9501426Z warnings.warn( 2023-01-11T22:51:00.9503507Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.9503734Z warnings.warn( 2023-01-11T22:51:00.9503962Z File "", line 1, in 2023-01-11T22:51:00.9504370Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9504669Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9505062Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9505365Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9505780Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9505979Z self.run() 2023-01-11T22:51:00.9506345Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9506630Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9507325Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9507598Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9508494Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9508740Z getattr(self, test_name)() 2023-01-11T22:51:00.9509487Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9509679Z fn() 2023-01-11T22:51:00.9510405Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9510658Z test(self, **param_kwargs) 2023-01-11T22:51:00.9511386Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9511623Z return func(*args, **kwargs) 2023-01-11T22:51:00.9512205Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9512440Z self.run_subtests( 2023-01-11T22:51:00.9513157Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9513469Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9514327Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9514639Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9515419Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9515653Z output = model(*input) 2023-01-11T22:51:00.9516322Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9516599Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9517370Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9517723Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9518453Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9518701Z _lazy_init(state, module) 2023-01-11T22:51:00.9519413Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9519756Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9520572Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9520868Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9521553Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9521802Z return func(*args, **kwargs) 2023-01-11T22:51:00.9522554Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9522773Z p_assert( 2023-01-11T22:51:00.9523461Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9523714Z traceback.print_stack() 2023-01-11T22:51:00.9523949Z File "", line 1, in 2023-01-11T22:51:00.9524354Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9524628Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9525024Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9525285Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9525712Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9525907Z self.run() 2023-01-11T22:51:00.9526305Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9526708Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9527396Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9527662Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9528405Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9528636Z getattr(self, test_name)() 2023-01-11T22:51:00.9529366Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9529558Z fn() 2023-01-11T22:51:00.9530313Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9530553Z test(self, **param_kwargs) 2023-01-11T22:51:00.9531271Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9531537Z return func(*args, **kwargs) 2023-01-11T22:51:00.9532216Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9532427Z self.run_subtests( 2023-01-11T22:51:00.9533142Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9533465Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9534200Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9534502Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9535269Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9535501Z output = model(*input) 2023-01-11T22:51:00.9536159Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9536429Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9537465Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9537818Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9538576Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9538810Z _lazy_init(state, module) 2023-01-11T22:51:00.9539528Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9539870Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9540675Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9540958Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9541645Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9541884Z return func(*args, **kwargs) 2023-01-11T22:51:00.9542669Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9542870Z p_assert( 2023-01-11T22:51:00.9543547Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9543798Z traceback.print_stack() 2023-01-11T22:51:00.9544045Z File "", line 1, in 2023-01-11T22:51:00.9544437Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9544715Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9545108Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9545399Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9545988Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9546185Z self.run() 2023-01-11T22:51:00.9546588Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9546846Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9547543Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9547801Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9548528Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9548785Z getattr(self, test_name)() 2023-01-11T22:51:00.9549519Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9549706Z fn() 2023-01-11T22:51:00.9550463Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9550696Z test(self, **param_kwargs) 2023-01-11T22:51:00.9551563Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9551841Z return func(*args, **kwargs) 2023-01-11T22:51:00.9552419Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9552637Z self.run_subtests( 2023-01-11T22:51:00.9553367Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9553704Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9554443Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9554726Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9555494Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9555732Z output = model(*input) 2023-01-11T22:51:00.9556395Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9556675Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9557435Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9557783Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9558530Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9558744Z _lazy_init(state, module) 2023-01-11T22:51:00.9559475Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9559817Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9560635Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9560921Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9561610Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9561865Z return func(*args, **kwargs) 2023-01-11T22:51:00.9562634Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9562811Z p_assert( 2023-01-11T22:51:00.9563489Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9563759Z traceback.print_stack() 2023-01-11T22:51:00.9563993Z File "", line 1, in 2023-01-11T22:51:00.9564533Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9564811Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9565212Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9565513Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9565899Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9566112Z self.run() 2023-01-11T22:51:00.9566496Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9566788Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9567477Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9567734Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9568467Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9568726Z getattr(self, test_name)() 2023-01-11T22:51:00.9569433Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9569753Z fn() 2023-01-11T22:51:00.9570506Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9570742Z test(self, **param_kwargs) 2023-01-11T22:51:00.9571470Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9571719Z return func(*args, **kwargs) 2023-01-11T22:51:00.9572301Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9572523Z self.run_subtests( 2023-01-11T22:51:00.9573221Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9573552Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9574302Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9574612Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9575378Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9575618Z output = model(*input) 2023-01-11T22:51:00.9576279Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9576794Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9577555Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9577915Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9578664Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9578895Z _lazy_init(state, module) 2023-01-11T22:51:00.9579627Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9579962Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9580769Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9581055Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9581716Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9581955Z return func(*args, **kwargs) 2023-01-11T22:51:00.9582719Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9583088Z p_assert( 2023-01-11T22:51:00.9583782Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9584040Z traceback.print_stack() 2023-01-11T22:51:00.9584285Z File "", line 1, in 2023-01-11T22:51:00.9584700Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9584947Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9585347Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9585637Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9586059Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9586264Z self.run() 2023-01-11T22:51:00.9586658Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9586936Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9587614Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9587872Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9588755Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9589011Z getattr(self, test_name)() 2023-01-11T22:51:00.9589744Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9589942Z fn() 2023-01-11T22:51:00.9590687Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9590925Z test(self, **param_kwargs) 2023-01-11T22:51:00.9591631Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9591886Z return func(*args, **kwargs) 2023-01-11T22:51:00.9592475Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9592712Z self.run_subtests( 2023-01-11T22:51:00.9593516Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9593846Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9594586Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9594897Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9595638Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9595873Z output = model(*input) 2023-01-11T22:51:00.9596542Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9596819Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9597590Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9597946Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9598697Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9598940Z _lazy_init(state, module) 2023-01-11T22:51:00.9599641Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9599971Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9600784Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9601058Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9601745Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9602127Z return func(*args, **kwargs) 2023-01-11T22:51:00.9602897Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9603101Z p_assert( 2023-01-11T22:51:00.9603763Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9604025Z traceback.print_stack() 2023-01-11T22:51:00.9604265Z File "", line 1, in 2023-01-11T22:51:00.9604676Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9604951Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9605339Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9605645Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9606053Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9606250Z self.run() 2023-01-11T22:51:00.9606622Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9607058Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9607759Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9608032Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9608760Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9608997Z getattr(self, test_name)() 2023-01-11T22:51:00.9609739Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9609922Z fn() 2023-01-11T22:51:00.9610668Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9610923Z test(self, **param_kwargs) 2023-01-11T22:51:00.9611663Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9611911Z return func(*args, **kwargs) 2023-01-11T22:51:00.9612494Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9612731Z self.run_subtests( 2023-01-11T22:51:00.9613445Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9613741Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9614488Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9614794Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9615561Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9615811Z output = model(*input) 2023-01-11T22:51:00.9616477Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9617067Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9617853Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9618179Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9618929Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9619182Z _lazy_init(state, module) 2023-01-11T22:51:00.9619894Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9620227Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9621196Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9621478Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9622172Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9622398Z return func(*args, **kwargs) 2023-01-11T22:51:00.9623170Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9623384Z p_assert( 2023-01-11T22:51:00.9624057Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9624306Z traceback.print_stack() 2023-01-11T22:51:00.9625828Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9627486Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9627761Z File "", line 1, in 2023-01-11T22:51:00.9628179Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9628441Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9628835Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9629151Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9629563Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9629790Z self.run() 2023-01-11T22:51:00.9630174Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9630461Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9631165Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9631402Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9632139Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9632380Z getattr(self, test_name)() 2023-01-11T22:51:00.9633115Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9633328Z fn() 2023-01-11T22:51:00.9634067Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9634301Z test(self, **param_kwargs) 2023-01-11T22:51:00.9635047Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9635278Z return func(*args, **kwargs) 2023-01-11T22:51:00.9635861Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9636088Z self.run_subtests( 2023-01-11T22:51:00.9636804Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9637124Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9637864Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9638180Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9638943Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9639275Z output = model(*input) 2023-01-11T22:51:00.9639943Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9640227Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9640998Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9641351Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9642099Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9642348Z _lazy_init(state, module) 2023-01-11T22:51:00.9643068Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9643371Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9644192Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9644484Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9645281Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9645549Z return func(*args, **kwargs) 2023-01-11T22:51:00.9646322Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9646537Z p_assert( 2023-01-11T22:51:00.9647221Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9647444Z traceback.print_stack() 2023-01-11T22:51:00.9647681Z File "", line 1, in 2023-01-11T22:51:00.9648090Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9648375Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9648772Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9649081Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9649492Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9649699Z self.run() 2023-01-11T22:51:00.9650055Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9650353Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9651039Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9651298Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9652051Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9652291Z getattr(self, test_name)() 2023-01-11T22:51:00.9653027Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9653213Z fn() 2023-01-11T22:51:00.9653963Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9654201Z test(self, **param_kwargs) 2023-01-11T22:51:00.9654923Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9655178Z return func(*args, **kwargs) 2023-01-11T22:51:00.9655755Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9655975Z self.run_subtests( 2023-01-11T22:51:00.9656992Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9657310Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9658060Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9658543Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9659317Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9659560Z output = model(*input) 2023-01-11T22:51:00.9660228Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9660505Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9661276Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9661638Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9662363Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9662626Z _lazy_init(state, module) 2023-01-11T22:51:00.9663338Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9663795Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9664615Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9664897Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9665588Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9665835Z return func(*args, **kwargs) 2023-01-11T22:51:00.9666583Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9666794Z p_assert( 2023-01-11T22:51:00.9667480Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9667730Z traceback.print_stack() 2023-01-11T22:51:00.9667974Z File "", line 1, in 2023-01-11T22:51:00.9668383Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9668678Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9669035Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9669349Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9669760Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9669972Z self.run() 2023-01-11T22:51:00.9670360Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9670641Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9671330Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9671596Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9672321Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9672559Z getattr(self, test_name)() 2023-01-11T22:51:00.9673293Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9673497Z fn() 2023-01-11T22:51:00.9674235Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9674476Z test(self, **param_kwargs) 2023-01-11T22:51:00.9675210Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9675455Z return func(*args, **kwargs) 2023-01-11T22:51:00.9676012Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9676358Z self.run_subtests( 2023-01-11T22:51:00.9677083Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9677400Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9678146Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9678450Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9679214Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9679467Z output = model(*input) 2023-01-11T22:51:00.9680096Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9680359Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9681129Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9681472Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9682321Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9682573Z _lazy_init(state, module) 2023-01-11T22:51:00.9683295Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9683623Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9684414Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9684706Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9685402Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9685646Z return func(*args, **kwargs) 2023-01-11T22:51:00.9686430Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9686642Z p_assert( 2023-01-11T22:51:00.9687325Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9687578Z traceback.print_stack() 2023-01-11T22:51:00.9687813Z File "", line 1, in 2023-01-11T22:51:00.9688202Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9688489Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9688878Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9689166Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9689581Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9689774Z self.run() 2023-01-11T22:51:00.9690170Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9690442Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9691139Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9691419Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9692141Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9692375Z getattr(self, test_name)() 2023-01-11T22:51:00.9693184Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9693380Z fn() 2023-01-11T22:51:00.9694102Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9694348Z test(self, **param_kwargs) 2023-01-11T22:51:00.9695077Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9695431Z return func(*args, **kwargs) 2023-01-11T22:51:00.9696015Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9696252Z self.run_subtests( 2023-01-11T22:51:00.9697224Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9697549Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9698288Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9698613Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9699378Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9699614Z output = model(*input) 2023-01-11T22:51:00.9700289Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9700555Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9701488Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9701841Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9702598Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9702824Z _lazy_init(state, module) 2023-01-11T22:51:00.9703542Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9703880Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9704690Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9704984Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9705681Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9705932Z return func(*args, **kwargs) 2023-01-11T22:51:00.9706701Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9706899Z p_assert( 2023-01-11T22:51:00.9707575Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9707824Z traceback.print_stack() 2023-01-11T22:51:00.9708071Z File "", line 1, in 2023-01-11T22:51:00.9708470Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9708755Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9709139Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9709424Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9709830Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9710037Z self.run() 2023-01-11T22:51:00.9710439Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9710718Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9711418Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9711687Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9712419Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9712638Z getattr(self, test_name)() 2023-01-11T22:51:00.9713369Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9713574Z fn() 2023-01-11T22:51:00.9714462Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9714717Z test(self, **param_kwargs) 2023-01-11T22:51:00.9715452Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9715689Z return func(*args, **kwargs) 2023-01-11T22:51:00.9716283Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9716501Z self.run_subtests( 2023-01-11T22:51:00.9717214Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9717527Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9718278Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9718589Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9719358Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9719693Z output = model(*input) 2023-01-11T22:51:00.9720366Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9720618Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9721381Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9721715Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9722474Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9722711Z _lazy_init(state, module) 2023-01-11T22:51:00.9723430Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9723784Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9724604Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9724878Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9725566Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9725806Z return func(*args, **kwargs) 2023-01-11T22:51:00.9726580Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9726779Z p_assert( 2023-01-11T22:51:00.9727459Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9727713Z traceback.print_stack() 2023-01-11T22:51:00.9727956Z File "", line 1, in 2023-01-11T22:51:00.9728348Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9728623Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9729025Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9729317Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9729730Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9729921Z self.run() 2023-01-11T22:51:00.9730317Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9730567Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9731257Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9731532Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9732265Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9732631Z getattr(self, test_name)() 2023-01-11T22:51:00.9733360Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9733549Z fn() 2023-01-11T22:51:00.9734299Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9734526Z test(self, **param_kwargs) 2023-01-11T22:51:00.9735253Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9735508Z return func(*args, **kwargs) 2023-01-11T22:51:00.9736087Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9736314Z self.run_subtests( 2023-01-11T22:51:00.9737273Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9737593Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9738500Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9738796Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9739566Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9739824Z output = model(*input) 2023-01-11T22:51:00.9740484Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9740752Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9741520Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9741859Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9742613Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9742861Z _lazy_init(state, module) 2023-01-11T22:51:00.9743557Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9743891Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9744702Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9744963Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9745649Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9745885Z return func(*args, **kwargs) 2023-01-11T22:51:00.9746662Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9746878Z p_assert( 2023-01-11T22:51:00.9747535Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9747790Z traceback.print_stack() 2023-01-11T22:51:00.9748038Z File "", line 1, in 2023-01-11T22:51:00.9748439Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9748716Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9749109Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9749404Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9749798Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9749999Z self.run() 2023-01-11T22:51:00.9750386Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9750667Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9751526Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9751795Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9752543Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9752796Z getattr(self, test_name)() 2023-01-11T22:51:00.9753506Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9753697Z fn() 2023-01-11T22:51:00.9754443Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9754687Z test(self, **param_kwargs) 2023-01-11T22:51:00.9755409Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9755662Z return func(*args, **kwargs) 2023-01-11T22:51:00.9756256Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9756480Z self.run_subtests( 2023-01-11T22:51:00.9757280Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9757622Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9758364Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9758675Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9759450Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9759683Z output = model(*input) 2023-01-11T22:51:00.9760344Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9760638Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9761376Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9761727Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9762484Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9762716Z _lazy_init(state, module) 2023-01-11T22:51:00.9763433Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9763769Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9764582Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9764871Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9765537Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9765783Z return func(*args, **kwargs) 2023-01-11T22:51:00.9766546Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9766755Z p_assert( 2023-01-11T22:51:00.9767437Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9767684Z traceback.print_stack() 2023-01-11T22:51:00.9767939Z File "", line 1, in 2023-01-11T22:51:00.9768343Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9768600Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9768971Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9769276Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9769807Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9770001Z self.run() 2023-01-11T22:51:00.9770396Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9770677Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9771344Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9771628Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9772357Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9772592Z getattr(self, test_name)() 2023-01-11T22:51:00.9773333Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9773530Z fn() 2023-01-11T22:51:00.9774268Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9774518Z test(self, **param_kwargs) 2023-01-11T22:51:00.9775218Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9775574Z return func(*args, **kwargs) 2023-01-11T22:51:00.9776169Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9776402Z self.run_subtests( 2023-01-11T22:51:00.9777404Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9777721Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9778476Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9778782Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9779549Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9779777Z output = model(*input) 2023-01-11T22:51:00.9780435Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9780708Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9781564Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9781918Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9782664Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9782914Z _lazy_init(state, module) 2023-01-11T22:51:00.9783628Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9783929Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9784756Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9785038Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9785727Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9785988Z return func(*args, **kwargs) 2023-01-11T22:51:00.9786758Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9786958Z p_assert( 2023-01-11T22:51:00.9787645Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9787875Z traceback.print_stack() 2023-01-11T22:51:00.9788112Z File "", line 1, in 2023-01-11T22:51:00.9788525Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9788976Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9789363Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9789678Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9790083Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9790264Z self.run() 2023-01-11T22:51:00.9790652Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9790938Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9791637Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9791900Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9792637Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9792873Z getattr(self, test_name)() 2023-01-11T22:51:00.9793686Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9793866Z fn() 2023-01-11T22:51:00.9794733Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9795011Z test(self, **param_kwargs) 2023-01-11T22:51:00.9795736Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9795970Z return func(*args, **kwargs) 2023-01-11T22:51:00.9796562Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9796794Z self.run_subtests( 2023-01-11T22:51:00.9797513Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9797801Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9798559Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9798866Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9799635Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9799876Z output = model(*input) 2023-01-11T22:51:00.9800532Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9800799Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9801571Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9801887Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9802644Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9802895Z _lazy_init(state, module) 2023-01-11T22:51:00.9803620Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9803960Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9804766Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9805051Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9805740Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9805955Z return func(*args, **kwargs) 2023-01-11T22:51:00.9806730Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9806938Z p_assert( 2023-01-11T22:51:00.9807618Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9807995Z traceback.print_stack() 2023-01-11T22:51:00.9808237Z File "", line 1, in 2023-01-11T22:51:00.9808656Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9808925Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9809303Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9809591Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9810010Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9810203Z self.run() 2023-01-11T22:51:00.9810604Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9810884Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9811579Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9811826Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9812665Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9812933Z getattr(self, test_name)() 2023-01-11T22:51:00.9813665Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9813863Z fn() 2023-01-11T22:51:00.9814610Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9814855Z test(self, **param_kwargs) 2023-01-11T22:51:00.9815587Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9815819Z return func(*args, **kwargs) 2023-01-11T22:51:00.9816394Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9816944Z self.run_subtests( 2023-01-11T22:51:00.9817683Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9818013Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9818757Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9819069Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9819826Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9820058Z output = model(*input) 2023-01-11T22:51:00.9820700Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9820989Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9821750Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9822107Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9822869Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9823106Z _lazy_init(state, module) 2023-01-11T22:51:00.9823835Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9824176Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9824959Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9825233Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9825912Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9826325Z return func(*args, **kwargs) 2023-01-11T22:51:00.9827099Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9827323Z p_assert( 2023-01-11T22:51:00.9828010Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9828253Z traceback.print_stack() 2023-01-11T22:51:00.9828471Z File "", line 1, in 2023-01-11T22:51:00.9828876Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9829142Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9829546Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9829836Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9830254Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9830465Z self.run() 2023-01-11T22:51:00.9830840Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9831115Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9831988Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9832267Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9833011Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9833267Z getattr(self, test_name)() 2023-01-11T22:51:00.9833991Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9834181Z fn() 2023-01-11T22:51:00.9834908Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9835159Z test(self, **param_kwargs) 2023-01-11T22:51:00.9835899Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9836141Z return func(*args, **kwargs) 2023-01-11T22:51:00.9836726Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9836948Z self.run_subtests( 2023-01-11T22:51:00.9837673Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9838006Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9838722Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9839034Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9839800Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9840037Z output = model(*input) 2023-01-11T22:51:00.9840708Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9841000Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9841765Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9842118Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9842843Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9843085Z _lazy_init(state, module) 2023-01-11T22:51:00.9843800Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9844133Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9844949Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9845353Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9846056Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9846316Z return func(*args, **kwargs) 2023-01-11T22:51:00.9847054Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9847267Z p_assert( 2023-01-11T22:51:00.9847957Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9848207Z traceback.print_stack() 2023-01-11T22:51:00.9848447Z File "", line 1, in 2023-01-11T22:51:00.9848863Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9849142Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9849540Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9849826Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9850333Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9850550Z self.run() 2023-01-11T22:51:00.9850945Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9851226Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9851936Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9852199Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9852916Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9853173Z getattr(self, test_name)() 2023-01-11T22:51:00.9853905Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9854107Z fn() 2023-01-11T22:51:00.9854853Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9855096Z test(self, **param_kwargs) 2023-01-11T22:51:00.9855818Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9856073Z return func(*args, **kwargs) 2023-01-11T22:51:00.9856868Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9857100Z self.run_subtests( 2023-01-11T22:51:00.9857794Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9858133Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9858884Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9859185Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9859970Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9860208Z output = model(*input) 2023-01-11T22:51:00.9860852Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9861138Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9861908Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9862250Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9863005Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9863236Z _lazy_init(state, module) 2023-01-11T22:51:00.9864120Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9864460Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9865283Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9865554Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9866242Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9866496Z return func(*args, **kwargs) 2023-01-11T22:51:00.9867275Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9867473Z p_assert( 2023-01-11T22:51:00.9868156Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9868422Z traceback.print_stack() 2023-01-11T22:51:00.9870063Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9871577Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9871836Z File "", line 1, in 2023-01-11T22:51:00.9872241Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9872529Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9872924Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9873230Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9873654Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9873858Z self.run() 2023-01-11T22:51:00.9874250Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9874497Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9875195Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9875468Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9876200Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9876453Z getattr(self, test_name)() 2023-01-11T22:51:00.9877185Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9877386Z fn() 2023-01-11T22:51:00.9878128Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9878361Z test(self, **param_kwargs) 2023-01-11T22:51:00.9879088Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9879336Z return func(*args, **kwargs) 2023-01-11T22:51:00.9879919Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9880150Z self.run_subtests( 2023-01-11T22:51:00.9880869Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9881193Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9881937Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9882323Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9883102Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9883338Z output = model(*input) 2023-01-11T22:51:00.9883996Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9884284Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9885043Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9885389Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9886124Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9886334Z _lazy_init(state, module) 2023-01-11T22:51:00.9887056Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9887399Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9888315Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9888610Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9889301Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9889557Z return func(*args, **kwargs) 2023-01-11T22:51:00.9890327Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9890505Z p_assert( 2023-01-11T22:51:00.9891183Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9891438Z traceback.print_stack() 2023-01-11T22:51:00.9891695Z File "", line 1, in 2023-01-11T22:51:00.9892098Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9892378Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9892781Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9893151Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9893552Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9893748Z self.run() 2023-01-11T22:51:00.9894144Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9894419Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9895116Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9895386Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9896119Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9896358Z getattr(self, test_name)() 2023-01-11T22:51:00.9897361Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9897560Z fn() 2023-01-11T22:51:00.9898312Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9898558Z test(self, **param_kwargs) 2023-01-11T22:51:00.9899294Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9899538Z return func(*args, **kwargs) 2023-01-11T22:51:00.9900116Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9900317Z self.run_subtests( 2023-01-11T22:51:00.9901049Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9901597Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9902348Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9902665Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9903435Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9903675Z output = model(*input) 2023-01-11T22:51:00.9904346Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9904597Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9905361Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9905717Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9906472Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9906854Z _lazy_init(state, module) 2023-01-11T22:51:00.9907581Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9907918Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9908728Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9909006Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9909675Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9909937Z return func(*args, **kwargs) 2023-01-11T22:51:00.9910707Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9910919Z p_assert( 2023-01-11T22:51:00.9911609Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9911874Z traceback.print_stack() 2023-01-11T22:51:00.9912123Z File "", line 1, in 2023-01-11T22:51:00.9912504Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9912791Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9913187Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9913476Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9913895Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9914093Z self.run() 2023-01-11T22:51:00.9914494Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9914775Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9915463Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9915741Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9916487Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9916744Z getattr(self, test_name)() 2023-01-11T22:51:00.9917476Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9917677Z fn() 2023-01-11T22:51:00.9918426Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9918674Z test(self, **param_kwargs) 2023-01-11T22:51:00.9919382Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9919756Z return func(*args, **kwargs) 2023-01-11T22:51:00.9920342Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9920589Z self.run_subtests( 2023-01-11T22:51:00.9921310Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9921623Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9922375Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9922674Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9923428Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9923670Z output = model(*input) 2023-01-11T22:51:00.9924336Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9924620Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9925511Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9925873Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9926620Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9926855Z _lazy_init(state, module) 2023-01-11T22:51:00.9927553Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9927893Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9928708Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9929005Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9929697Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9929941Z return func(*args, **kwargs) 2023-01-11T22:51:00.9930727Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9930932Z p_assert( 2023-01-11T22:51:00.9931596Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9931854Z traceback.print_stack() 2023-01-11T22:51:00.9932099Z File "", line 1, in 2023-01-11T22:51:00.9932508Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9932786Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9933180Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9933463Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9933899Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9934085Z self.run() 2023-01-11T22:51:00.9934477Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9934763Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9935417Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9935682Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9936404Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9936918Z getattr(self, test_name)() 2023-01-11T22:51:00.9937585Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9937747Z fn() 2023-01-11T22:51:00.9938358Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9938702Z test(self, **param_kwargs) 2023-01-11T22:51:00.9939273Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9939451Z return func(*args, **kwargs) 2023-01-11T22:51:00.9939893Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9940058Z self.run_subtests( 2023-01-11T22:51:00.9940592Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9940829Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9941396Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9941620Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9942268Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9942461Z output = model(*input) 2023-01-11T22:51:00.9943132Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9943356Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9943963Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9944247Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9944881Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9945088Z _lazy_init(state, module) 2023-01-11T22:51:00.9945717Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9945979Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9946714Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9946956Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9947604Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9947811Z return func(*args, **kwargs) 2023-01-11T22:51:00.9948474Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9948647Z p_assert( 2023-01-11T22:51:00.9949298Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9949519Z traceback.print_stack() 2023-01-11T22:51:00.9949745Z File "", line 1, in 2023-01-11T22:51:00.9950094Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9950360Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9950704Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9950926Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9951286Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9951454Z self.run() 2023-01-11T22:51:00.9951765Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9951960Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9952550Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9952790Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9953323Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9953578Z getattr(self, test_name)() 2023-01-11T22:51:00.9953946Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9954045Z fn() 2023-01-11T22:51:00.9954411Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9954517Z test(self, **param_kwargs) 2023-01-11T22:51:00.9954869Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9954992Z return func(*args, **kwargs) 2023-01-11T22:51:00.9955285Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9955402Z self.run_subtests( 2023-01-11T22:51:00.9955750Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9955910Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9956274Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9956478Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9956862Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9956980Z output = model(*input) 2023-01-11T22:51:00.9957305Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9957439Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9957813Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9957984Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9958349Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9958456Z _lazy_init(state, module) 2023-01-11T22:51:00.9958811Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9958977Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9959372Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9959514Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9959850Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9959975Z return func(*args, **kwargs) 2023-01-11T22:51:00.9960348Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9960433Z p_assert( 2023-01-11T22:51:00.9960769Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9960894Z traceback.print_stack() 2023-01-11T22:51:00.9961022Z File "", line 1, in 2023-01-11T22:51:00.9961227Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:00.9961365Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:00.9961566Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:00.9961714Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:00.9961908Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:00.9962009Z self.run() 2023-01-11T22:51:00.9962209Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:00.9962351Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:00.9962688Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:00.9962878Z self.run_test(test_name, pipe) 2023-01-11T22:51:00.9963243Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:00.9963368Z getattr(self, test_name)() 2023-01-11T22:51:00.9963710Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:00.9963805Z fn() 2023-01-11T22:51:00.9964165Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:00.9964284Z test(self, **param_kwargs) 2023-01-11T22:51:00.9964635Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:00.9964759Z return func(*args, **kwargs) 2023-01-11T22:51:00.9965054Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:51:00.9965168Z self.run_subtests( 2023-01-11T22:51:00.9965554Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:00.9965719Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:00.9966080Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:00.9966231Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:00.9966599Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:00.9966717Z output = model(*input) 2023-01-11T22:51:00.9967038Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:00.9967175Z return forward_call(*args, **kwargs) 2023-01-11T22:51:00.9967535Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:00.9967708Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:00.9968076Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:00.9968193Z _lazy_init(state, module) 2023-01-11T22:51:00.9968544Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:00.9968711Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:00.9969101Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:00.9969242Z handle.init_flat_param_attributes() 2023-01-11T22:51:00.9969561Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:00.9969684Z return func(*args, **kwargs) 2023-01-11T22:51:00.9970064Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:00.9970166Z p_assert( 2023-01-11T22:51:00.9970496Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:00.9970618Z traceback.print_stack() 2023-01-11T22:51:00.9971359Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9972095Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9972897Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9973627Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9974354Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9975125Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9975863Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9976946Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9978081Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:259: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:00.9978191Z world_indices[ 2023-01-11T22:51:00.9979194Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:237: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:00.9979337Z (rank, world_num_valid_indices[rank]) 2023-01-11T22:51:00.9980336Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:259: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:00.9980446Z world_indices[ 2023-01-11T22:51:00.9980556Z dist init r=1, world=2 2023-01-11T22:51:00.9980883Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9981197Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9981615Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9981914Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9982212Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9982508Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9982804Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9983161Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9983468Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9983747Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9984043Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9984339Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:00.9984452Z dist init r=0, world=2 2023-01-11T22:51:00.9984775Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9985086Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9985387Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9985686Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9985987Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9986287Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9986586Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9986867Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9987167Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9987466Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9987766Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9988118Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:00.9988220Z ok (5.713s) 2023-01-11T22:51:00.9988541Z test_transformer_offload_false_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95854 2023-01-11T22:51:00.9988758Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95855 2023-01-11T22:51:00.9989135Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.9989308Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.9989670Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.9989862Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.9990270Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:00.9990447Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:00.9990827Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:00.9991014Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:00.9991259Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:00.9991501Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:00.9991897Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.9992276Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:00.9992507Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:00.9992729Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:00.9992960Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.9993245Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.9994258Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.9994375Z warnings.warn( 2023-01-11T22:51:00.9995381Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:00.9995491Z warnings.warn( 2023-01-11T22:51:00.9995719Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.9995942Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.9996670Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9997473Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9998210Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9998948Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:00.9999228Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.9999462Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.9999692Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:00.9999918Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0000650Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0001385Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0002118Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0002849Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0003082Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0003309Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0003541Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0003749Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0004477Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0005201Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0005987Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0006712Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0006944Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0007173Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0007403Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0007671Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0008408Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0009131Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0009857Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0010587Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0011307Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0012030Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0012752Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0013472Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0014191Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0014965Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0015682Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0016404Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0017912Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0018670Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0018903Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0019138Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0019350Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0019584Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0019809Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0020036Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0020760Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0021484Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0022218Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0022941Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0023658Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0024455Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0025174Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0025896Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0026662Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0027390Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0028107Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0028832Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0029546Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0030263Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0030502Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0030729Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0030958Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0031169Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0032138Z /opt/conda/lib/python3.10/site-packages/torch/_tensor.py:795: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:01.0032367Z return torch._VF.split_with_sizes(self, split_size, dim) 2023-01-11T22:51:01.0033320Z /opt/conda/lib/python3.10/site-packages/torch/_tensor.py:795: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:01.0033491Z return torch._VF.split_with_sizes(self, split_size, dim) 2023-01-11T22:51:01.0033720Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0033947Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0034178Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0034404Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0035181Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0035918Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0036642Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0037372Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0038094Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0038811Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0039536Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0040259Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0040977Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0041756Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0042471Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0043189Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0043952Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0044676Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0044908Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0045122Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0045849Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0046566Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0047288Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0048004Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0048727Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0049447Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0050225Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0050939Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0051656Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0052425Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0053154Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0053871Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0054595Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0055309Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0056026Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0057133Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0057880Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0058601Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0059413Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0060127Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0060843Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0061648Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0062378Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0063094Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0063816Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0064527Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0065242Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0065962Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0066677Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0067393Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0068166Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0068881Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0069597Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0070358Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0071073Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0071788Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0072511Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0073227Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0073463Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0073693Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0073917Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0074148Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0074876Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0075602Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0076320Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0077100Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0077315Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0077544Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0077774Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0078002Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0078224Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0078454Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0078732Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0078963Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0079691Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0080419Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0081207Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0081938Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0082661Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0083384Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0084104Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0084805Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0085591Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0086311Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0087030Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0087795Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0088521Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0089237Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0089475Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0089709Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0089934Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0090160Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0090271Z dist init r=1, world=2 2023-01-11T22:51:01.0090377Z dist init r=0, world=2 2023-01-11T22:51:01.0090477Z ok (9.519s) 2023-01-11T22:51:01.0090780Z test_transformer_offload_false_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95937 2023-01-11T22:51:01.0090995Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95938 2023-01-11T22:51:01.0091368Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:01.0091543Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:01.0091922Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:01.0092110Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:01.0092472Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:01.0092646Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:01.0093019Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:01.0093240Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:01.0093485Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:01.0093793Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:01.0094196Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:01.0094584Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:01.0094808Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:01.0095034Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:01.0095263Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0095492Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0096740Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:01.0096972Z warnings.warn( 2023-01-11T22:51:01.0098094Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:01.0098207Z warnings.warn( 2023-01-11T22:51:01.0098438Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0098670Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0099415Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0100150Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0100881Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0101607Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0101836Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0102065Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0102294Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0102522Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0103257Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0104074Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0104811Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0105591Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0105835Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0106067Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0106296Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0106521Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0106744Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0106968Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0107703Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0108429Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0109161Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0109885Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0110117Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0110348Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0110560Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0110785Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0111010Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0111232Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0111507Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0111729Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0112461Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0113183Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0113909Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0114675Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0114913Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0115141Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0115352Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0115577Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0115806Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0116028Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0116762Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0117482Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0118207Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0118939Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0119165Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0119390Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0119615Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0119839Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0120608Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0121329Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0122057Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0122838Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0123578Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0124300Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0125029Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0125748Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0126473Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0127195Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0127911Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0128631Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0128917Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0129145Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0129375Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0129600Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0129825Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0130048Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0130775Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0131545Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0132537Z /opt/conda/lib/python3.10/site-packages/torch/nn/parameter.py:55: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:01.0132773Z result = type(self)(self.data.clone(memory_format=torch.preserve_format), self.requires_grad) 2023-01-11T22:51:01.0133001Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0133216Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0133450Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0133677Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0134407Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0135132Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0135862Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0136853Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0137154Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0137390Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0137619Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0137939Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0138167Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0138379Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0139121Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0139847Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0140653Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0141394Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0141508Z dist init r=1, world=2 2023-01-11T22:51:01.0141614Z dist init r=0, world=2 2023-01-11T22:51:01.0141714Z ok (9.819s) 2023-01-11T22:51:01.0142040Z test_transformer_offload_false_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 96020 2023-01-11T22:51:01.0142259Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 96021 2023-01-11T22:51:01.0142626Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:01.0142785Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:01.0143158Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:01.0143347Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:01.0143710Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:01.0143883Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:01.0144252Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:01.0144442Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:01.0144686Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:01.0144929Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:01.0145312Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:01.0145701Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:01.0145927Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:01.0146149Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:01.0146379Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0146665Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0147679Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:01.0147793Z warnings.warn( 2023-01-11T22:51:01.0148798Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:01.0148911Z warnings.warn( 2023-01-11T22:51:01.0149185Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0149403Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0150142Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0150869Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0151608Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0152343Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0152575Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0152806Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0153035Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0153267Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0154005Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0154730Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0155457Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0156239Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0156472Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0156703Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0156933Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0157144Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0157367Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0157595Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0158366Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0159101Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0159829Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0160561Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0160790Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0161019Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0161246Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0161474Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0161700Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0161906Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0162135Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0162361Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0163090Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0163818Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0164601Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0165327Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0165555Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0165783Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0166014Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0166242Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0166512Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0166726Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0167452Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0168175Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0168904Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0169631Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0169860Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0170087Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0170315Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0170539Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0171265Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0171992Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0172712Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0173490Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0174209Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0174932Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0175699Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0176427Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0177553Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0178294Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0179019Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0179735Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0179970Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0180199Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0180413Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0180642Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0180866Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0181089Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0181812Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0182633Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0183616Z /opt/conda/lib/python3.10/site-packages/torch/nn/parameter.py:55: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:51:01.0183857Z result = type(self)(self.data.clone(memory_format=torch.preserve_format), self.requires_grad) 2023-01-11T22:51:01.0184141Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0184380Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0184610Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0184820Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0185551Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0186279Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0187008Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0187733Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0187965Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0188196Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0188427Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0188655Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0188874Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0189099Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0189823Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0190545Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0191325Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0192048Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0192163Z dist init r=1, world=2 2023-01-11T22:51:01.0192258Z dist init r=0, world=2 2023-01-11T22:51:01.0192360Z ok (9.719s) 2023-01-11T22:51:01.0192716Z test_transformer_offload_true_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 96103 2023-01-11T22:51:01.0192937Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 96104 2023-01-11T22:51:01.0193359Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:01.0193536Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:01.0193916Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:01.0194103Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:01.0194448Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:01.0194627Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:01.0195000Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:01.0195189Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:01.0195428Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:01.0195669Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:01.0196063Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:01.0196454Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:01.0196680Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:01.0196892Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:01.0197125Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0197358Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0198371Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:01.0198484Z warnings.warn( 2023-01-11T22:51:01.0199487Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:01.0199688Z warnings.warn( 2023-01-11T22:51:01.0199817Z File "", line 1, in 2023-01-11T22:51:01.0200027Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0200168Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0200353Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0200502Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0200716Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0200817Z self.run() 2023-01-11T22:51:01.0201018Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0201168Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0201572Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0201712Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0202059Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0202183Z getattr(self, test_name)() 2023-01-11T22:51:01.0202541Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0202638Z fn() 2023-01-11T22:51:01.0203000Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0203122Z test(self, **param_kwargs) 2023-01-11T22:51:01.0203476Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0203606Z return func(*args, **kwargs) 2023-01-11T22:51:01.0203829Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0203943Z self.run_subtests( 2023-01-11T22:51:01.0204294Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0204456Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0204816Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0204966Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0205337Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0205456Z output = model(*input) 2023-01-11T22:51:01.0205762Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0205904Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0206284Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0206459Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0206821Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0206942Z _lazy_init(state, module) 2023-01-11T22:51:01.0207294Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0207461Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0207839Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0208039Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0208378Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0208508Z return func(*args, **kwargs) 2023-01-11T22:51:01.0208881Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0208982Z p_assert( 2023-01-11T22:51:01.0209312Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0209438Z traceback.print_stack() 2023-01-11T22:51:01.0209549Z File "", line 1, in 2023-01-11T22:51:01.0209755Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0209897Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0210097Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0210247Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0210454Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0210560Z self.run() 2023-01-11T22:51:01.0210789Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0210941Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0211279Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0211409Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0211768Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0211891Z getattr(self, test_name)() 2023-01-11T22:51:01.0212245Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0212340Z fn() 2023-01-11T22:51:01.0212687Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0212811Z test(self, **param_kwargs) 2023-01-11T22:51:01.0213167Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0213292Z return func(*args, **kwargs) 2023-01-11T22:51:01.0213526Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0213637Z self.run_subtests( 2023-01-11T22:51:01.0213986Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0214147Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0214488Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0214642Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0215014Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0215132Z output = model(*input) 2023-01-11T22:51:01.0215453Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0215592Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0215963Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0216136Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0216482Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0216904Z _lazy_init(state, module) 2023-01-11T22:51:01.0217561Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0217839Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0218251Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0218395Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0218731Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0218854Z return func(*args, **kwargs) 2023-01-11T22:51:01.0219207Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0219310Z p_assert( 2023-01-11T22:51:01.0219644Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0219768Z traceback.print_stack() 2023-01-11T22:51:01.0220002Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0220239Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0221040Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0221787Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0222520Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0223264Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0223992Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0224718Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0225447Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0226171Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0226891Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0227668Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0227800Z File "", line 1, in 2023-01-11T22:51:01.0227994Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0228137Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0228340Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0228488Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0228698Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0228803Z self.run() 2023-01-11T22:51:01.0229001Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0229145Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0229515Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0229655Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0230018Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0230140Z getattr(self, test_name)() 2023-01-11T22:51:01.0230494Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0230591Z fn() 2023-01-11T22:51:01.0230953Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0231077Z test(self, **param_kwargs) 2023-01-11T22:51:01.0231416Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0231539Z return func(*args, **kwargs) 2023-01-11T22:51:01.0231778Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0231890Z self.run_subtests( 2023-01-11T22:51:01.0232236Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0239899Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0240352Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0240499Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0240878Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0241001Z output = model(*input) 2023-01-11T22:51:01.0241316Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0241455Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0241838Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0242001Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0242359Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0242468Z _lazy_init(state, module) 2023-01-11T22:51:01.0242813Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0242981Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0243361Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0243613Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0243961Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0244086Z return func(*args, **kwargs) 2023-01-11T22:51:01.0244461Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0244564Z p_assert( 2023-01-11T22:51:01.0244897Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0245023Z traceback.print_stack() 2023-01-11T22:51:01.0245135Z File "", line 1, in 2023-01-11T22:51:01.0245345Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0245484Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0245684Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0245836Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0246096Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0246208Z self.run() 2023-01-11T22:51:01.0246392Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0246536Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0246879Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0247011Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0247368Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0247490Z getattr(self, test_name)() 2023-01-11T22:51:01.0247845Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0247946Z fn() 2023-01-11T22:51:01.0248290Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0248417Z test(self, **param_kwargs) 2023-01-11T22:51:01.0248773Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0248895Z return func(*args, **kwargs) 2023-01-11T22:51:01.0249131Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0249244Z self.run_subtests( 2023-01-11T22:51:01.0249595Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0249756Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0250102Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0250256Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0250630Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0250751Z output = model(*input) 2023-01-11T22:51:01.0251077Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0251213Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0251586Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0251758Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0252104Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0252224Z _lazy_init(state, module) 2023-01-11T22:51:01.0252577Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0252802Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0253205Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0253348Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0253682Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0253807Z return func(*args, **kwargs) 2023-01-11T22:51:01.0254163Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0254266Z p_assert( 2023-01-11T22:51:01.0254599Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0254723Z traceback.print_stack() 2023-01-11T22:51:01.0254960Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0255192Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0255369Z File "", line 1, in 2023-01-11T22:51:01.0255585Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0255709Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0255908Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0256058Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0256265Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0256369Z self.run() 2023-01-11T22:51:01.0256865Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0257033Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0257371Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0257510Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0257874Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0257998Z getattr(self, test_name)() 2023-01-11T22:51:01.0258358Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0258452Z fn() 2023-01-11T22:51:01.0258812Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0258935Z test(self, **param_kwargs) 2023-01-11T22:51:01.0259271Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0259396Z return func(*args, **kwargs) 2023-01-11T22:51:01.0259633Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0259749Z self.run_subtests( 2023-01-11T22:51:01.0260106Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0260267Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0260627Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0260779Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0261131Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0261251Z output = model(*input) 2023-01-11T22:51:01.0261571Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0261709Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0262084Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0262362Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0262736Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0262860Z _lazy_init(state, module) 2023-01-11T22:51:01.0263192Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0263359Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0263753Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0263897Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0264232Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0264361Z return func(*args, **kwargs) 2023-01-11T22:51:01.0264737Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0264897Z p_assert( 2023-01-11T22:51:01.0265247Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0265356Z traceback.print_stack() 2023-01-11T22:51:01.0265480Z File "", line 1, in 2023-01-11T22:51:01.0265686Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0265826Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0266027Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0266176Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0266386Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0266476Z self.run() 2023-01-11T22:51:01.0266678Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0266821Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0267164Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0267297Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0267657Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0267779Z getattr(self, test_name)() 2023-01-11T22:51:01.0268136Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0268218Z fn() 2023-01-11T22:51:01.0268579Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0268701Z test(self, **param_kwargs) 2023-01-11T22:51:01.0269057Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0269182Z return func(*args, **kwargs) 2023-01-11T22:51:01.0269422Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0269535Z self.run_subtests( 2023-01-11T22:51:01.0269888Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0270033Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0270394Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0270544Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0270918Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0271036Z output = model(*input) 2023-01-11T22:51:01.0271420Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0271558Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0271934Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0272091Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0272456Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0272577Z _lazy_init(state, module) 2023-01-11T22:51:01.0272927Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0273093Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0273484Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0273628Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0274023Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0274136Z return func(*args, **kwargs) 2023-01-11T22:51:01.0274515Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0274614Z p_assert( 2023-01-11T22:51:01.0274948Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0275073Z traceback.print_stack() 2023-01-11T22:51:01.0275308Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0275541Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0275670Z File "", line 1, in 2023-01-11T22:51:01.0275866Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0276006Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0276210Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0276361Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0276571Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0276674Z self.run() 2023-01-11T22:51:01.0276874Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0277003Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0277344Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0277475Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0277833Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0277959Z getattr(self, test_name)() 2023-01-11T22:51:01.0278314Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0278413Z fn() 2023-01-11T22:51:01.0278776Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0278881Z test(self, **param_kwargs) 2023-01-11T22:51:01.0279234Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0279357Z return func(*args, **kwargs) 2023-01-11T22:51:01.0279591Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0279705Z self.run_subtests( 2023-01-11T22:51:01.0280050Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0280267Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0280628Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0280766Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0281139Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0281257Z output = model(*input) 2023-01-11T22:51:01.0281580Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0281716Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0282090Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0282261Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0282623Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0282730Z _lazy_init(state, module) 2023-01-11T22:51:01.0283127Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0283301Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0283698Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0283841Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0284174Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0284298Z return func(*args, **kwargs) 2023-01-11T22:51:01.0284673Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0284759Z p_assert( 2023-01-11T22:51:01.0285096Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0285222Z traceback.print_stack() 2023-01-11T22:51:01.0285350Z File "", line 1, in 2023-01-11T22:51:01.0285554Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0285695Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0285894Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0286045Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0286238Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0286339Z self.run() 2023-01-11T22:51:01.0286538Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0286683Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0287019Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0287153Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0287514Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0287638Z getattr(self, test_name)() 2023-01-11T22:51:01.0287977Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0288073Z fn() 2023-01-11T22:51:01.0288433Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0288554Z test(self, **param_kwargs) 2023-01-11T22:51:01.0288909Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0289031Z return func(*args, **kwargs) 2023-01-11T22:51:01.0289268Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0289418Z self.run_subtests( 2023-01-11T22:51:01.0289773Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0289935Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0290294Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0290442Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0290813Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0290931Z output = model(*input) 2023-01-11T22:51:01.0291251Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0291371Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0291744Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0291920Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0292330Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0292456Z _lazy_init(state, module) 2023-01-11T22:51:01.0292810Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0292977Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0293431Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0293572Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0293891Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0294019Z return func(*args, **kwargs) 2023-01-11T22:51:01.0294392Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0294494Z p_assert( 2023-01-11T22:51:01.0294828Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0294953Z traceback.print_stack() 2023-01-11T22:51:01.0295700Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0296438Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0297687Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0298440Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0299170Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0300014Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0300742Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0301460Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0301699Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0301973Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0302113Z File "", line 1, in 2023-01-11T22:51:01.0302321Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0302464Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0302667Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0302816Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0303024Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0303128Z self.run() 2023-01-11T22:51:01.0303312Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0303458Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0303807Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0303943Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0304302Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0304428Z getattr(self, test_name)() 2023-01-11T22:51:01.0304785Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0304867Z fn() 2023-01-11T22:51:01.0305227Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0305349Z test(self, **param_kwargs) 2023-01-11T22:51:01.0305703Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0305830Z return func(*args, **kwargs) 2023-01-11T22:51:01.0306068Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0306184Z self.run_subtests( 2023-01-11T22:51:01.0306535Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0306680Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0307042Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0307192Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0307561Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0307678Z output = model(*input) 2023-01-11T22:51:01.0308002Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0308212Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0308594Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0308751Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0309118Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0309238Z _lazy_init(state, module) 2023-01-11T22:51:01.0309589Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0309756Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0310153Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0310294Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0310634Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0310757Z return func(*args, **kwargs) 2023-01-11T22:51:01.0311162Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0311270Z p_assert( 2023-01-11T22:51:01.0311608Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0311732Z traceback.print_stack() 2023-01-11T22:51:01.0311864Z File "", line 1, in 2023-01-11T22:51:01.0312072Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0312213Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0312397Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0312548Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0312762Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0312864Z self.run() 2023-01-11T22:51:01.0313068Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0313213Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0313550Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0313682Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0314026Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0314146Z getattr(self, test_name)() 2023-01-11T22:51:01.0314499Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0314597Z fn() 2023-01-11T22:51:01.0314959Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0315085Z test(self, **param_kwargs) 2023-01-11T22:51:01.0315443Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0315566Z return func(*args, **kwargs) 2023-01-11T22:51:01.0315786Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0315898Z self.run_subtests( 2023-01-11T22:51:01.0316247Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0316405Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0316764Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0316912Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0317282Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0317455Z output = model(*input) 2023-01-11T22:51:01.0317769Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0317905Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0318277Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0318451Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0318815Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0318935Z _lazy_init(state, module) 2023-01-11T22:51:01.0319283Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0319449Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0319831Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0320019Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0320367Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0320492Z return func(*args, **kwargs) 2023-01-11T22:51:01.0320863Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0320963Z p_assert( 2023-01-11T22:51:01.0321297Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0321420Z traceback.print_stack() 2023-01-11T22:51:01.0321638Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0321869Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0322002Z File "", line 1, in 2023-01-11T22:51:01.0322212Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0322353Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0322553Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0322701Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0322893Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0322996Z self.run() 2023-01-11T22:51:01.0323191Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0323335Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0323676Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0323808Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0323938Z File "", line 1, in 2023-01-11T22:51:01.0324298Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0324407Z getattr(self, test_name)() 2023-01-11T22:51:01.0324763Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0324859Z fn() 2023-01-11T22:51:01.0325065Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0325204Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0325569Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0325689Z test(self, **param_kwargs) 2023-01-11T22:51:01.0325889Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0326022Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0326442Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0326566Z return func(*args, **kwargs) 2023-01-11T22:51:01.0326780Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0326885Z self.run() 2023-01-11T22:51:01.0327121Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0327236Z self.run_subtests( 2023-01-11T22:51:01.0327418Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0327564Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0327917Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0328079Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0328416Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0328548Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0328951Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0329107Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0329450Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0329574Z getattr(self, test_name)() 2023-01-11T22:51:01.0329946Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0330065Z output = model(*input) 2023-01-11T22:51:01.0330418Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0330513Z fn() 2023-01-11T22:51:01.0330841Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0330978Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0331326Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0331447Z test(self, **param_kwargs) 2023-01-11T22:51:01.0331818Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0331990Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0332346Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0332471Z return func(*args, **kwargs) 2023-01-11T22:51:01.0332832Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0332955Z _lazy_init(state, module) 2023-01-11T22:51:01.0333175Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0333288Z self.run_subtests( 2023-01-11T22:51:01.0333641Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0333808Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0334152Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0334310Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0334704Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0334844Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0335186Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0335391Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0335734Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0335858Z return func(*args, **kwargs) 2023-01-11T22:51:01.0336226Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0336344Z output = model(*input) 2023-01-11T22:51:01.0336914Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0337026Z p_assert( 2023-01-11T22:51:01.0337345Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0337480Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0337811Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0337942Z traceback.print_stack() 2023-01-11T22:51:01.0338403Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0338588Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0338952Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0339071Z _lazy_init(state, module) 2023-01-11T22:51:01.0339404Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0339573Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0339962Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0340104Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0340444Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0340567Z return func(*args, **kwargs) 2023-01-11T22:51:01.0340942Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0341043Z p_assert( 2023-01-11T22:51:01.0341379Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0341488Z traceback.print_stack() 2023-01-11T22:51:01.0341720Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0341955Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0342082Z File "", line 1, in 2023-01-11T22:51:01.0342287Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0342429Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0342631Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0342762Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0342974Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0343075Z self.run() 2023-01-11T22:51:01.0343276Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0343422Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0343761Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0343892Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0344248Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0344354Z getattr(self, test_name)() 2023-01-11T22:51:01.0344711Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0344880Z fn() 2023-01-11T22:51:01.0345249Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0345370Z test(self, **param_kwargs) 2023-01-11T22:51:01.0345723Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0345845Z return func(*args, **kwargs) 2023-01-11T22:51:01.0346082Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0346179Z self.run_subtests( 2023-01-11T22:51:01.0346528Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0346687Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0347048Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0347203Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0347659Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0347784Z output = model(*input) 2023-01-11T22:51:01.0348111Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0348233Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0348604Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0348774Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0349138Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0349257Z _lazy_init(state, module) 2023-01-11T22:51:01.0349613Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0349781Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0350176Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0350301Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0350634Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0350755Z return func(*args, **kwargs) 2023-01-11T22:51:01.0351126Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0351229Z p_assert( 2023-01-11T22:51:01.0351562Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0351689Z traceback.print_stack() 2023-01-11T22:51:01.0351815Z File "", line 1, in 2023-01-11T22:51:01.0352006Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0352146Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0352343Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0352493Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0352700Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0352802Z self.run() 2023-01-11T22:51:01.0353001Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0353129Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0353467Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0353598Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0354011Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0354130Z getattr(self, test_name)() 2023-01-11T22:51:01.0354487Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0354583Z fn() 2023-01-11T22:51:01.0354943Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0355050Z test(self, **param_kwargs) 2023-01-11T22:51:01.0355401Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0355524Z return func(*args, **kwargs) 2023-01-11T22:51:01.0355760Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0355875Z self.run_subtests( 2023-01-11T22:51:01.0356227Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0356387Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0356790Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0356931Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0357307Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0357425Z output = model(*input) 2023-01-11T22:51:01.0357748Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0357884Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0358257Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0358432Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0358794Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0358902Z _lazy_init(state, module) 2023-01-11T22:51:01.0359255Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0359419Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0359813Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0359953Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0360290Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0360412Z return func(*args, **kwargs) 2023-01-11T22:51:01.0360784Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0360873Z p_assert( 2023-01-11T22:51:01.0361211Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0361333Z traceback.print_stack() 2023-01-11T22:51:01.0362078Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0362817Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0363615Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0364349Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0365077Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0365846Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0366583Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0367307Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0368037Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0368758Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0369486Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0370212Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0370939Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0371659Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0372384Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0373160Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0373400Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0373632Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0373762Z File "", line 1, in 2023-01-11T22:51:01.0373972Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0374101Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0374307Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0374501Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0374719Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0374824Z self.run() 2023-01-11T22:51:01.0375024Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0375165Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0375492Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0375624Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0375985Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0376106Z getattr(self, test_name)() 2023-01-11T22:51:01.0376470Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0376817Z fn() 2023-01-11T22:51:01.0377404Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0377534Z test(self, **param_kwargs) 2023-01-11T22:51:01.0377877Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0378006Z return func(*args, **kwargs) 2023-01-11T22:51:01.0378243Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0378357Z self.run_subtests( 2023-01-11T22:51:01.0378708Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0378870Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0379237Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0379389Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0379749Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0379870Z output = model(*input) 2023-01-11T22:51:01.0380195Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0380334Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0380714Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0380887Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0381315Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0381534Z _lazy_init(state, module) 2023-01-11T22:51:01.0381877Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0382049Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0382443Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0382585Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0382923Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0383049Z return func(*args, **kwargs) 2023-01-11T22:51:01.0383427Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0383529Z p_assert( 2023-01-11T22:51:01.0383844Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0383971Z traceback.print_stack() 2023-01-11T22:51:01.0384097Z File "", line 1, in 2023-01-11T22:51:01.0384362Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0384513Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0384715Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0384862Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0385071Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0385157Z self.run() 2023-01-11T22:51:01.0385357Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0385501Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0385844Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0385979Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0386336Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0386462Z getattr(self, test_name)() 2023-01-11T22:51:01.0386821Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0386902Z fn() 2023-01-11T22:51:01.0387264Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0387386Z test(self, **param_kwargs) 2023-01-11T22:51:01.0387741Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0387866Z return func(*args, **kwargs) 2023-01-11T22:51:01.0388103Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0388219Z self.run_subtests( 2023-01-11T22:51:01.0388570Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0388718Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0389077Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0389228Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0389599Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0389719Z output = model(*input) 2023-01-11T22:51:01.0390044Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0390181Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0390555Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0390773Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0391145Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0391264Z _lazy_init(state, module) 2023-01-11T22:51:01.0391614Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0391781Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0392172Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0392313Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0392648Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0392755Z return func(*args, **kwargs) 2023-01-11T22:51:01.0393180Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0393287Z p_assert( 2023-01-11T22:51:01.0393675Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0393807Z traceback.print_stack() 2023-01-11T22:51:01.0394042Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0394274Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0394407Z File "", line 1, in 2023-01-11T22:51:01.0394599Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0394739Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0394938Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0395085Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0395299Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0395402Z self.run() 2023-01-11T22:51:01.0395604Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0395732Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0396073Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0396204Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0396558Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0396680Z getattr(self, test_name)() 2023-01-11T22:51:01.0397036Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0397134Z fn() 2023-01-11T22:51:01.0397495Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0397603Z test(self, **param_kwargs) 2023-01-11T22:51:01.0397955Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0398079Z return func(*args, **kwargs) 2023-01-11T22:51:01.0398314Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0398426Z self.run_subtests( 2023-01-11T22:51:01.0398773Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0398932Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0399290Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0399424Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0399855Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0399974Z output = model(*input) 2023-01-11T22:51:01.0400300Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0400439Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0400813Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0400985Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0401347Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0401449Z _lazy_init(state, module) 2023-01-11T22:51:01.0401800Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0401965Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0402361Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0402561Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0402908Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0403033Z return func(*args, **kwargs) 2023-01-11T22:51:01.0403410Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0403495Z p_assert( 2023-01-11T22:51:01.0403830Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0403953Z traceback.print_stack() 2023-01-11T22:51:01.0404082Z File "", line 1, in 2023-01-11T22:51:01.0404289Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0404434Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0404635Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0404787Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0404979Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0405084Z self.run() 2023-01-11T22:51:01.0405282Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0405425Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0405763Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0405894Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0406249Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0406369Z getattr(self, test_name)() 2023-01-11T22:51:01.0406711Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0406808Z fn() 2023-01-11T22:51:01.0407172Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0407295Z test(self, **param_kwargs) 2023-01-11T22:51:01.0407650Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0407774Z return func(*args, **kwargs) 2023-01-11T22:51:01.0408011Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0408107Z self.run_subtests( 2023-01-11T22:51:01.0408456Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0408618Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0409038Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0409191Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0409566Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0409685Z output = model(*input) 2023-01-11T22:51:01.0410007Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0410128Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0410502Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0410673Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0411037Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0411161Z _lazy_init(state, module) 2023-01-11T22:51:01.0411512Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0411727Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0412127Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0412271Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0412589Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0412715Z return func(*args, **kwargs) 2023-01-11T22:51:01.0413087Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0413189Z p_assert( 2023-01-11T22:51:01.0413520Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0413650Z traceback.print_stack() 2023-01-11T22:51:01.0413885Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0414121Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0414233Z File "", line 1, in 2023-01-11T22:51:01.0414438Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0414582Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0414781Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0414929Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0415138Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0415240Z self.run() 2023-01-11T22:51:01.0415423Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0415571Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0415912Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0416046Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0416404Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0416527Z getattr(self, test_name)() 2023-01-11T22:51:01.0417465Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0417570Z fn() 2023-01-11T22:51:01.0417922Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0418044Z test(self, **param_kwargs) 2023-01-11T22:51:01.0418397Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0418619Z return func(*args, **kwargs) 2023-01-11T22:51:01.0418858Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0418975Z self.run_subtests( 2023-01-11T22:51:01.0419328Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0419489Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0419833Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0419984Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0420356Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0420475Z output = model(*input) 2023-01-11T22:51:01.0420800Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0420939Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0421374Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0421555Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0421907Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0422028Z _lazy_init(state, module) 2023-01-11T22:51:01.0422378Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0422545Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0422939Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0423081Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0423423Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0423546Z return func(*args, **kwargs) 2023-01-11T22:51:01.0423908Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0424010Z p_assert( 2023-01-11T22:51:01.0424344Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0424468Z traceback.print_stack() 2023-01-11T22:51:01.0424597Z File "", line 1, in 2023-01-11T22:51:01.0424803Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0424945Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0425145Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0425279Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0425490Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0425592Z self.run() 2023-01-11T22:51:01.0425794Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0425941Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0426279Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0426409Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0426749Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0426872Z getattr(self, test_name)() 2023-01-11T22:51:01.0427226Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0427323Z fn() 2023-01-11T22:51:01.0427683Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0427861Z test(self, **param_kwargs) 2023-01-11T22:51:01.0428219Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0428343Z return func(*args, **kwargs) 2023-01-11T22:51:01.0428562Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0428674Z self.run_subtests( 2023-01-11T22:51:01.0429023Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0429181Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0429540Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0429690Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0430061Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0430182Z output = model(*input) 2023-01-11T22:51:01.0430532Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0430674Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0431051Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0431225Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0431588Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0431706Z _lazy_init(state, module) 2023-01-11T22:51:01.0432056Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0432222Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0432602Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0432746Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0433081Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0433205Z return func(*args, **kwargs) 2023-01-11T22:51:01.0433578Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0433680Z p_assert( 2023-01-11T22:51:01.0434012Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0434136Z traceback.print_stack() 2023-01-11T22:51:01.0434882Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0435628Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0436364Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0437082Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0437878Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0438602Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0439331Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0440105Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0440349Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0440582Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0440712Z File "", line 1, in 2023-01-11T22:51:01.0440922Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0441064Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0441269Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0441418Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0441615Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0441719Z self.run() 2023-01-11T22:51:01.0441919Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0442064Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0442407Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0442538Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0442894Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0443017Z getattr(self, test_name)() 2023-01-11T22:51:01.0443353Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0443453Z fn() 2023-01-11T22:51:01.0443815Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0443939Z test(self, **param_kwargs) 2023-01-11T22:51:01.0444294Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0444418Z return func(*args, **kwargs) 2023-01-11T22:51:01.0444653Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0444767Z self.run_subtests( 2023-01-11T22:51:01.0445100Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0445261Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0445618Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0445827Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0446207Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0446328Z output = model(*input) 2023-01-11T22:51:01.0446651Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0446787Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0447142Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0447316Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0447680Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0447800Z _lazy_init(state, module) 2023-01-11T22:51:01.0448152Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0448322Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0448762Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0448909Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0449230Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0449355Z return func(*args, **kwargs) 2023-01-11T22:51:01.0449727Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0449828Z p_assert( 2023-01-11T22:51:01.0450159Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0450285Z traceback.print_stack() 2023-01-11T22:51:01.0450417Z File "", line 1, in 2023-01-11T22:51:01.0450627Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0450752Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0450957Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0451107Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0451314Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0451417Z self.run() 2023-01-11T22:51:01.0451617Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0451760Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0452081Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0452212Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0452567Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0452692Z getattr(self, test_name)() 2023-01-11T22:51:01.0453052Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0453149Z fn() 2023-01-11T22:51:01.0453511Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0453634Z test(self, **param_kwargs) 2023-01-11T22:51:01.0453971Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0454095Z return func(*args, **kwargs) 2023-01-11T22:51:01.0454330Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0454442Z self.run_subtests( 2023-01-11T22:51:01.0454792Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0455009Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0455379Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0455531Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0455888Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0456008Z output = model(*input) 2023-01-11T22:51:01.0456330Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0456467Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0457048Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0457224Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0457593Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0457713Z _lazy_init(state, module) 2023-01-11T22:51:01.0458118Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0458294Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0458694Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0458840Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0459171Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0459296Z return func(*args, **kwargs) 2023-01-11T22:51:01.0459671Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0459777Z p_assert( 2023-01-11T22:51:01.0460111Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0460223Z traceback.print_stack() 2023-01-11T22:51:01.0460457Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0460689Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0460817Z File "", line 1, in 2023-01-11T22:51:01.0460940Z File "", line 1, in 2023-01-11T22:51:01.0461149Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0461289Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0461469Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0461618Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0461825Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0461966Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0462180Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0462283Z self.run() 2023-01-11T22:51:01.0462480Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0462627Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0462811Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0462956Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0463164Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0463267Z self.run() 2023-01-11T22:51:01.0463607Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0463739Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0464011Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0464138Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0464503Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0464625Z getattr(self, test_name)() 2023-01-11T22:51:01.0464960Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0465088Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0465443Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0465540Z fn() 2023-01-11T22:51:01.0465896Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0466002Z getattr(self, test_name)() 2023-01-11T22:51:01.0466365Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0466489Z test(self, **param_kwargs) 2023-01-11T22:51:01.0466900Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0467004Z fn() 2023-01-11T22:51:01.0467361Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0467485Z return func(*args, **kwargs) 2023-01-11T22:51:01.0467843Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0467948Z test(self, **param_kwargs) 2023-01-11T22:51:01.0468186Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0468298Z self.run_subtests( 2023-01-11T22:51:01.0468655Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0468784Z return func(*args, **kwargs) 2023-01-11T22:51:01.0469135Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0469295Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0469531Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0469628Z self.run_subtests( 2023-01-11T22:51:01.0469992Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0470142Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0470487Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0470643Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0471018Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0471137Z output = model(*input) 2023-01-11T22:51:01.0471495Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0471628Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0471951Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0472088Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0472460Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0472577Z output = model(*input) 2023-01-11T22:51:01.0472949Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0473176Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0473502Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0473627Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0473993Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0474112Z _lazy_init(state, module) 2023-01-11T22:51:01.0474486Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0474656Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0475008Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0475174Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0475536Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0475643Z _lazy_init(state, module) 2023-01-11T22:51:01.0476084Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0476233Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0476583Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0476748Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0477081Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0477205Z return func(*args, **kwargs) 2023-01-11T22:51:01.0477598Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0477738Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0478102Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0478203Z p_assert( 2023-01-11T22:51:01.0478542Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0478663Z return func(*args, **kwargs) 2023-01-11T22:51:01.0478998Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0479124Z traceback.print_stack() 2023-01-11T22:51:01.0479496Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0479582Z p_assert( 2023-01-11T22:51:01.0479909Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0480030Z traceback.print_stack() 2023-01-11T22:51:01.0480267Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0480501Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0481248Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0481986Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0482719Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0483510Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0484238Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0484965Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0485750Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0486487Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0487211Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0487940Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0488664Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0489385Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0489627Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0489859Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0490071Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0490299Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0490525Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0490751Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0491480Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0492300Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0493030Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0493809Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0494585Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0495319Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0496044Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0497229Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0497994Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0498720Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0499448Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0500174Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0500894Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0501720Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0501955Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0502188Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0502417Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0502647Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0503420Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0504151Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0504874Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0505608Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0506333Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0507055Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0507788Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0508506Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0509226Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0510006Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0510726Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0511447Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0512214Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0512940Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0513659Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0514382Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0515104Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0515820Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0516541Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0517261Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0517978Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0518700Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0519476Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0520193Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0520431Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0520668Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0520940Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0521177Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0521907Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0522633Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0523362Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0524086Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0524806Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0525534Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0526253Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0526972Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0527747Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0528463Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0529181Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0529955Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0530689Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0531405Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0531642Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0531860Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0532088Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0532317Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0532546Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0532770Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0533498Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0534226Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0534949Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0535670Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0536457Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0537384Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0538110Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0538906Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0539645Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0540365Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0541088Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0541806Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0542520Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0543244Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0543476Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0543704Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0543918Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0544148Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0544875Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0545680Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0546401Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0547120Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0547888Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0548617Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0549337Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0550058Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0550775Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0551494Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0552222Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0552940Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0553659Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0554428Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0554661Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0554892Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0555121Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0555354Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0555579Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0555790Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0556563Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0557294Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0558020Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0558752Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0559471Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0560188Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0560914Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0561638Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0562357Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0563134Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0563855Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0564573Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0565333Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0566061Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0566296Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0566526Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0566760Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0566991Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0567718Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0568436Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0569162Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0569885Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0570115Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0570343Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0570553Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0570782Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0571572Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0572297Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0573023Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0573789Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0574517Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0575237Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0575963Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0577025Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0577876Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0578602Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0579324Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0580042Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0580869Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0581589Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0581823Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0582055Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0582285Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0582519Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0582802Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0583019Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0583751Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0584476Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0585209Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0585936Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0586659Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0587388Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0588109Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0588832Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0589612Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0590331Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0591053Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0591826Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0592558Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0593323Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0593561Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0593795Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0593907Z dist init r=0, world=2 2023-01-11T22:51:01.0594234Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.0594549Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.0594853Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.0595156Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.0595461Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.0595745Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.0596042Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.0596336Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.0596632Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.0596986Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.0597287Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.0597585Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.0597698Z dist init r=1, world=2 2023-01-11T22:51:01.0598018Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.0598329Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.0598639Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.0598968Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.0599279Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.0599580Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.0599877Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.0600177Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.0600481Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.0600780Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.0601078Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.0601377Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.0601478Z ok (10.520s) 2023-01-11T22:51:01.0601790Z test_transformer_offload_true_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 96186 2023-01-11T22:51:01.0601994Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 96187 2023-01-11T22:51:01.0602381Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:01.0602555Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:01.0602934Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:01.0603123Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:01.0603488Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:01.0603662Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:01.0604032Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:01.0604276Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:01.0604506Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:01.0604749Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:01.0605150Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:01.0605543Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:01.0605770Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:01.0605998Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:01.0606232Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0606467Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0607528Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:01.0607648Z warnings.warn( 2023-01-11T22:51:01.0608666Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:01.0608764Z warnings.warn( 2023-01-11T22:51:01.0608896Z File "", line 1, in 2023-01-11T22:51:01.0609106Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0609247Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0609449Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0609601Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0609811Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0609913Z self.run() 2023-01-11T22:51:01.0610096Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0610241Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0610587Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0610719Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0611083Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0611206Z getattr(self, test_name)() 2023-01-11T22:51:01.0611566Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0611646Z fn() 2023-01-11T22:51:01.0612010Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0612131Z test(self, **param_kwargs) 2023-01-11T22:51:01.0612485Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0612609Z return func(*args, **kwargs) 2023-01-11T22:51:01.0612902Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0613015Z self.run_subtests( 2023-01-11T22:51:01.0613370Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0613514Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0613876Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0614027Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0614399Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0614516Z output = model(*input) 2023-01-11T22:51:01.0614842Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0614978Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0615358Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0615566Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0615941Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0616063Z _lazy_init(state, module) 2023-01-11T22:51:01.0616414Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0616783Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0617506Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0617656Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0617999Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0618134Z return func(*args, **kwargs) 2023-01-11T22:51:01.0618495Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0618597Z p_assert( 2023-01-11T22:51:01.0618932Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0619057Z traceback.print_stack() 2023-01-11T22:51:01.0619186Z File "", line 1, in 2023-01-11T22:51:01.0619391Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0619529Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0619711Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0619859Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0620068Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0620174Z self.run() 2023-01-11T22:51:01.0620374Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0620523Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0620861Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0620993Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0621334Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0621457Z getattr(self, test_name)() 2023-01-11T22:51:01.0621810Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0621909Z fn() 2023-01-11T22:51:01.0622271Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0622498Z test(self, **param_kwargs) 2023-01-11T22:51:01.0622855Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0622979Z return func(*args, **kwargs) 2023-01-11T22:51:01.0623204Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0623317Z self.run_subtests( 2023-01-11T22:51:01.0623669Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0623830Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0624188Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0624340Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0624709Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0624831Z output = model(*input) 2023-01-11T22:51:01.0625139Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0625369Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0625757Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0625929Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0626293Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0626414Z _lazy_init(state, module) 2023-01-11T22:51:01.0626762Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0626929Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0627307Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0627453Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0627791Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0627915Z return func(*args, **kwargs) 2023-01-11T22:51:01.0628291Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0628392Z p_assert( 2023-01-11T22:51:01.0628727Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0628852Z traceback.print_stack() 2023-01-11T22:51:01.0629069Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0629302Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0630057Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0630800Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0631533Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0632268Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0633053Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0633782Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0634550Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0635291Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0636018Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0636742Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0636878Z File "", line 1, in 2023-01-11T22:51:01.0637088Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0637229Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0637431Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0637579Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0637774Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0637879Z self.run() 2023-01-11T22:51:01.0638081Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0638229Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0638572Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0638706Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0639069Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0639193Z getattr(self, test_name)() 2023-01-11T22:51:01.0639536Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0639633Z fn() 2023-01-11T22:51:01.0639995Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0640116Z test(self, **param_kwargs) 2023-01-11T22:51:01.0640470Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0640650Z return func(*args, **kwargs) 2023-01-11T22:51:01.0640888Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0640984Z self.run_subtests( 2023-01-11T22:51:01.0641342Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0641502Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0641863Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0642014Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0642389Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0642508Z output = model(*input) 2023-01-11T22:51:01.0642829Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0642953Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0643327Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0643544Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0643919Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0644042Z _lazy_init(state, module) 2023-01-11T22:51:01.0644395Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0644562Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0644957Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0645097Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0645422Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0645549Z return func(*args, **kwargs) 2023-01-11T22:51:01.0645928Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0646029Z p_assert( 2023-01-11T22:51:01.0646366Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0646491Z traceback.print_stack() 2023-01-11T22:51:01.0646619Z File "", line 1, in 2023-01-11T22:51:01.0646808Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0646950Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0647149Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0647298Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0647513Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0647616Z self.run() 2023-01-11T22:51:01.0647815Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0647963Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0648285Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0648418Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0648775Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0648897Z getattr(self, test_name)() 2023-01-11T22:51:01.0649252Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0649349Z fn() 2023-01-11T22:51:01.0649710Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0649888Z test(self, **param_kwargs) 2023-01-11T22:51:01.0650226Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0650355Z return func(*args, **kwargs) 2023-01-11T22:51:01.0650594Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0650708Z self.run_subtests( 2023-01-11T22:51:01.0651057Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0651220Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0651578Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0651727Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0652083Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0652207Z output = model(*input) 2023-01-11T22:51:01.0652586Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0652732Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0653107Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0653284Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0653648Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0653771Z _lazy_init(state, module) 2023-01-11T22:51:01.0654104Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0654270Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0654669Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0654811Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0655151Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0655276Z return func(*args, **kwargs) 2023-01-11T22:51:01.0655648Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0655750Z p_assert( 2023-01-11T22:51:01.0656066Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0656191Z traceback.print_stack() 2023-01-11T22:51:01.0656426Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0656854Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0656995Z File "", line 1, in 2023-01-11T22:51:01.0657205Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0657351Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0657551Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0657683Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0657893Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0657996Z self.run() 2023-01-11T22:51:01.0658198Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0658343Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0658692Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0658824Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0659269Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0659395Z getattr(self, test_name)() 2023-01-11T22:51:01.0659759Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0659857Z fn() 2023-01-11T22:51:01.0660219Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0660342Z test(self, **param_kwargs) 2023-01-11T22:51:01.0660695Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0660817Z return func(*args, **kwargs) 2023-01-11T22:51:01.0661036Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0661149Z self.run_subtests( 2023-01-11T22:51:01.0661497Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0661660Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0662085Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0662246Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0662623Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0662740Z output = model(*input) 2023-01-11T22:51:01.0663047Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0663185Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0663560Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0663736Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0664105Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0664227Z _lazy_init(state, module) 2023-01-11T22:51:01.0664577Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0664743Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0665118Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0665259Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0665592Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0665717Z return func(*args, **kwargs) 2023-01-11T22:51:01.0666089Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0666194Z p_assert( 2023-01-11T22:51:01.0666527Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0666656Z traceback.print_stack() 2023-01-11T22:51:01.0666767Z File "", line 1, in 2023-01-11T22:51:01.0666973Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0667114Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0667315Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0667464Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0667672Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0667775Z self.run() 2023-01-11T22:51:01.0667977Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0668162Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0668501Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0668636Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0668997Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0669119Z getattr(self, test_name)() 2023-01-11T22:51:01.0669472Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0669569Z fn() 2023-01-11T22:51:01.0669930Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0670036Z test(self, **param_kwargs) 2023-01-11T22:51:01.0670387Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0670514Z return func(*args, **kwargs) 2023-01-11T22:51:01.0670750Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0670862Z self.run_subtests( 2023-01-11T22:51:01.0671256Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0671424Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0671770Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0671922Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0672291Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0672411Z output = model(*input) 2023-01-11T22:51:01.0672734Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0672875Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0673246Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0673421Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0673786Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0673888Z _lazy_init(state, module) 2023-01-11T22:51:01.0674239Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0674405Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0674800Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0674940Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0675276Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0675403Z return func(*args, **kwargs) 2023-01-11T22:51:01.0675778Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0675864Z p_assert( 2023-01-11T22:51:01.0676196Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0676323Z traceback.print_stack() 2023-01-11T22:51:01.0676557Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0676789Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0676916Z File "", line 1, in 2023-01-11T22:51:01.0677121Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0677244Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0677505Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0677655Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0677869Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0677972Z self.run() 2023-01-11T22:51:01.0678171Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0678315Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0678658Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0678773Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0679132Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0679253Z getattr(self, test_name)() 2023-01-11T22:51:01.0679609Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0679710Z fn() 2023-01-11T22:51:01.0680117Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0680245Z test(self, **param_kwargs) 2023-01-11T22:51:01.0680598Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0680705Z return func(*args, **kwargs) 2023-01-11T22:51:01.0680944Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0681114Z self.run_subtests( 2023-01-11T22:51:01.0681470Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0681631Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0681989Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0682145Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0682523Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0682625Z output = model(*input) 2023-01-11T22:51:01.0682953Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0683090Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0683463Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0683636Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0684000Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0684121Z _lazy_init(state, module) 2023-01-11T22:51:01.0684477Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0684626Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0685025Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0685167Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0685501Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0685627Z return func(*args, **kwargs) 2023-01-11T22:51:01.0686000Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0686101Z p_assert( 2023-01-11T22:51:01.0686432Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0686599Z traceback.print_stack() 2023-01-11T22:51:01.0686729Z File "", line 1, in 2023-01-11T22:51:01.0686935Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0687079Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0687278Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0687426Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0687638Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0687741Z self.run() 2023-01-11T22:51:01.0687924Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0688068Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0688408Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0688538Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0688900Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0689023Z getattr(self, test_name)() 2023-01-11T22:51:01.0689423Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0689509Z fn() 2023-01-11T22:51:01.0689875Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0689996Z test(self, **param_kwargs) 2023-01-11T22:51:01.0690350Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0690475Z return func(*args, **kwargs) 2023-01-11T22:51:01.0690712Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0690823Z self.run_subtests( 2023-01-11T22:51:01.0691174Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0691321Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0691683Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0691836Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0692208Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0692326Z output = model(*input) 2023-01-11T22:51:01.0692647Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0692783Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0693157Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0693357Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0693734Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0693856Z _lazy_init(state, module) 2023-01-11T22:51:01.0694208Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0694374Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0694767Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0694907Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0695243Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0695367Z return func(*args, **kwargs) 2023-01-11T22:51:01.0695723Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0695888Z p_assert( 2023-01-11T22:51:01.0696227Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0696355Z traceback.print_stack() 2023-01-11T22:51:01.0697607Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0698370Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0699201Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0699959Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0700686Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0701422Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0702150Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0702877Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0703115Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0703351Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0703481Z File "", line 1, in 2023-01-11T22:51:01.0703678Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0703816Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0704018Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0704166Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0704377Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0704482Z self.run() 2023-01-11T22:51:01.0704681Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0704810Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0705156Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0705361Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0705730Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0705853Z getattr(self, test_name)() 2023-01-11T22:51:01.0706212Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0706311Z fn() 2023-01-11T22:51:01.0706673Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0706780Z test(self, **param_kwargs) 2023-01-11T22:51:01.0707136Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0707259Z return func(*args, **kwargs) 2023-01-11T22:51:01.0707495Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0707612Z self.run_subtests( 2023-01-11T22:51:01.0708010Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0708178Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0708538Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0708672Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0709042Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0709159Z output = model(*input) 2023-01-11T22:51:01.0709479Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0709614Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0709988Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0710163Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0710532Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0710635Z _lazy_init(state, module) 2023-01-11T22:51:01.0710986Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0711151Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0711548Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0711690Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0712025Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0712152Z return func(*args, **kwargs) 2023-01-11T22:51:01.0712526Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0712611Z p_assert( 2023-01-11T22:51:01.0712949Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0713074Z traceback.print_stack() 2023-01-11T22:51:01.0713201Z File "", line 1, in 2023-01-11T22:51:01.0713409Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0713548Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0713746Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0713893Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0714085Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0714188Z self.run() 2023-01-11T22:51:01.0714446Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0714592Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0714935Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0715065Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0715424Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0715547Z getattr(self, test_name)() 2023-01-11T22:51:01.0715883Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0715979Z fn() 2023-01-11T22:51:01.0716339Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0716458Z test(self, **param_kwargs) 2023-01-11T22:51:01.0716811Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0716938Z return func(*args, **kwargs) 2023-01-11T22:51:01.0717231Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0717334Z self.run_subtests( 2023-01-11T22:51:01.0717686Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0717845Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0718203Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0718352Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0718723Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0718842Z output = model(*input) 2023-01-11T22:51:01.0719167Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0719287Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0719663Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0719835Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0720196Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0720315Z _lazy_init(state, module) 2023-01-11T22:51:01.0720664Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0720826Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0721218Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0721361Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0721682Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0721807Z return func(*args, **kwargs) 2023-01-11T22:51:01.0722183Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0722283Z p_assert( 2023-01-11T22:51:01.0722621Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0722745Z traceback.print_stack() 2023-01-11T22:51:01.0722980Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0723213Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0723325Z File "", line 1, in 2023-01-11T22:51:01.0723590Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0723733Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0723936Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0724085Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0724292Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0724394Z self.run() 2023-01-11T22:51:01.0724576Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0724720Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0725061Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0725194Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0725554Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0725679Z getattr(self, test_name)() 2023-01-11T22:51:01.0726034Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0726179Z fn() 2023-01-11T22:51:01.0726531Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0726654Z test(self, **param_kwargs) 2023-01-11T22:51:01.0727008Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0727131Z return func(*args, **kwargs) 2023-01-11T22:51:01.0727367Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0727480Z self.run_subtests( 2023-01-11T22:51:01.0727829Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0727994Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0728335Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0728490Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0728868Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0728986Z output = model(*input) 2023-01-11T22:51:01.0729309Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0729445Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0729818Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0729990Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0730338Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0730461Z _lazy_init(state, module) 2023-01-11T22:51:01.0730813Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0730981Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0731376Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0731517Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0731851Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0731974Z return func(*args, **kwargs) 2023-01-11T22:51:01.0732329Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0732432Z p_assert( 2023-01-11T22:51:01.0732826Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0732953Z traceback.print_stack() 2023-01-11T22:51:01.0733081Z File "", line 1, in 2023-01-11T22:51:01.0733289Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0733429Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0733627Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0733760Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0733967Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0734072Z self.run() 2023-01-11T22:51:01.0734269Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0734413Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0734746Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0734882Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0735272Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0735402Z getattr(self, test_name)() 2023-01-11T22:51:01.0735760Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0735858Z fn() 2023-01-11T22:51:01.0736218Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0736338Z test(self, **param_kwargs) 2023-01-11T22:51:01.0736880Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0737012Z return func(*args, **kwargs) 2023-01-11T22:51:01.0737235Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0737354Z self.run_subtests( 2023-01-11T22:51:01.0737713Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0737879Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0738241Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0738394Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0738764Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0738884Z output = model(*input) 2023-01-11T22:51:01.0739189Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0739328Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0739699Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0739875Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0740241Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0740360Z _lazy_init(state, module) 2023-01-11T22:51:01.0740709Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0740874Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0741248Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0741388Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0741721Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0741932Z return func(*args, **kwargs) 2023-01-11T22:51:01.0742311Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0742414Z p_assert( 2023-01-11T22:51:01.0742751Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0742876Z traceback.print_stack() 2023-01-11T22:51:01.0743096Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0743327Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0743454Z File "", line 1, in 2023-01-11T22:51:01.0743663Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0743803Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0744000Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0744153Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0744362Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0744448Z self.run() 2023-01-11T22:51:01.0744707Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0744863Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0745205Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0745338Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0745696Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0745819Z getattr(self, test_name)() 2023-01-11T22:51:01.0746175Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0746257Z fn() 2023-01-11T22:51:01.0746622Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0746744Z test(self, **param_kwargs) 2023-01-11T22:51:01.0747100Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0747225Z return func(*args, **kwargs) 2023-01-11T22:51:01.0747461Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0747573Z self.run_subtests( 2023-01-11T22:51:01.0747925Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0748069Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0748426Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0748576Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0748952Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0749070Z output = model(*input) 2023-01-11T22:51:01.0749397Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0749533Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0749909Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0750064Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0750425Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0750545Z _lazy_init(state, module) 2023-01-11T22:51:01.0750894Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0751118Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0751517Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0751657Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0751994Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0752101Z return func(*args, **kwargs) 2023-01-11T22:51:01.0752474Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0752574Z p_assert( 2023-01-11T22:51:01.0752908Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0753033Z traceback.print_stack() 2023-01-11T22:51:01.0753161Z File "", line 1, in 2023-01-11T22:51:01.0753369Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0753509Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0753737Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0753895Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0754105Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0754209Z self.run() 2023-01-11T22:51:01.0754411Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0754556Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0754894Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0755009Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0755368Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0755493Z getattr(self, test_name)() 2023-01-11T22:51:01.0755850Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0755950Z fn() 2023-01-11T22:51:01.0756311Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0756432Z test(self, **param_kwargs) 2023-01-11T22:51:01.0756783Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0756889Z return func(*args, **kwargs) 2023-01-11T22:51:01.0757125Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0757236Z self.run_subtests( 2023-01-11T22:51:01.0757583Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0757747Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0758105Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0758258Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0758630Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0758734Z output = model(*input) 2023-01-11T22:51:01.0759059Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0759196Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0759570Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0759742Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0760107Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0760280Z _lazy_init(state, module) 2023-01-11T22:51:01.0760638Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0760789Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0761184Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0761326Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0761660Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0761782Z return func(*args, **kwargs) 2023-01-11T22:51:01.0762154Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0762256Z p_assert( 2023-01-11T22:51:01.0762590Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0762698Z traceback.print_stack() 2023-01-11T22:51:01.0763486Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0764231Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0764969Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0765716Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0766447Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0767175Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0767905Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0768630Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0769352Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0770171Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0770895Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0771620Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0772387Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0773116Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0773836Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0774564Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0774803Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0775037Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0775167Z File "", line 1, in 2023-01-11T22:51:01.0775378Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0775518Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0775720Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0775855Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0776068Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0776170Z self.run() 2023-01-11T22:51:01.0776370Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0776515Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0777345Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0777484Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0777835Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0777959Z getattr(self, test_name)() 2023-01-11T22:51:01.0778314Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0778510Z fn() 2023-01-11T22:51:01.0778876Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0779002Z test(self, **param_kwargs) 2023-01-11T22:51:01.0779356Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0779480Z return func(*args, **kwargs) 2023-01-11T22:51:01.0779700Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0779813Z self.run_subtests( 2023-01-11T22:51:01.0780165Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0780324Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0780686Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0780841Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0781213Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0781405Z output = model(*input) 2023-01-11T22:51:01.0781728Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0781869Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0782242Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0782414Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0782780Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0782899Z _lazy_init(state, module) 2023-01-11T22:51:01.0783248Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0783417Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0783796Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0783938Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0784275Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0784401Z return func(*args, **kwargs) 2023-01-11T22:51:01.0784772Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0784875Z p_assert( 2023-01-11T22:51:01.0785209Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0785334Z traceback.print_stack() 2023-01-11T22:51:01.0785445Z File "", line 1, in 2023-01-11T22:51:01.0785656Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0785796Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0785997Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0786145Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0786357Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0786464Z self.run() 2023-01-11T22:51:01.0786666Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0786793Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0787129Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0787260Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0787617Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0787793Z getattr(self, test_name)() 2023-01-11T22:51:01.0788157Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0788253Z fn() 2023-01-11T22:51:01.0788596Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0788718Z test(self, **param_kwargs) 2023-01-11T22:51:01.0789069Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0789193Z return func(*args, **kwargs) 2023-01-11T22:51:01.0789429Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0789542Z self.run_subtests( 2023-01-11T22:51:01.0789890Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0790055Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0790445Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0790605Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0790978Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0791100Z output = model(*input) 2023-01-11T22:51:01.0791420Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0791557Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0791929Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0792103Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0792473Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0792575Z _lazy_init(state, module) 2023-01-11T22:51:01.0792923Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0793087Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0793532Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0793674Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0794013Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0794137Z return func(*args, **kwargs) 2023-01-11T22:51:01.0794511Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0794600Z p_assert( 2023-01-11T22:51:01.0794931Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0795056Z traceback.print_stack() 2023-01-11T22:51:01.0795295Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0795528Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0795655Z File "", line 1, in 2023-01-11T22:51:01.0795864Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0795987Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0796187Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0796334Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0796545Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0796715Z self.run() 2023-01-11T22:51:01.0796916Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0797060Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0797408Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0797524Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0797882Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0798003Z getattr(self, test_name)() 2023-01-11T22:51:01.0798360Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0798456Z fn() 2023-01-11T22:51:01.0798818Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0798939Z test(self, **param_kwargs) 2023-01-11T22:51:01.0799296Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0799402Z return func(*args, **kwargs) 2023-01-11T22:51:01.0799686Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0799806Z self.run_subtests( 2023-01-11T22:51:01.0800158Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0800319Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0800677Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0800827Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0801198Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0801302Z output = model(*input) 2023-01-11T22:51:01.0801625Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0801760Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0802134Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0802307Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0802670Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0802789Z _lazy_init(state, module) 2023-01-11T22:51:01.0803138Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0803286Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0803682Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0803826Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0804167Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0804290Z return func(*args, **kwargs) 2023-01-11T22:51:01.0804662Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0804764Z p_assert( 2023-01-11T22:51:01.0805097Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0805206Z traceback.print_stack() 2023-01-11T22:51:01.0805336Z File "", line 1, in 2023-01-11T22:51:01.0805539Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0805682Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0805880Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0806083Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0806291Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0806399Z self.run() 2023-01-11T22:51:01.0806584Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0806728Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0807067Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0807198Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0807556Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0807677Z getattr(self, test_name)() 2023-01-11T22:51:01.0808032Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0808117Z fn() 2023-01-11T22:51:01.0808475Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0808640Z test(self, **param_kwargs) 2023-01-11T22:51:01.0809002Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0809128Z return func(*args, **kwargs) 2023-01-11T22:51:01.0809364Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0809476Z self.run_subtests( 2023-01-11T22:51:01.0809823Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0809966Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0810327Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0810482Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0810860Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0810982Z output = model(*input) 2023-01-11T22:51:01.0811304Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0811440Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0811812Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0811968Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0812332Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0812451Z _lazy_init(state, module) 2023-01-11T22:51:01.0812804Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0812974Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0813371Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0813513Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0813846Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0813971Z return func(*args, **kwargs) 2023-01-11T22:51:01.0814327Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0814431Z p_assert( 2023-01-11T22:51:01.0814761Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0814883Z traceback.print_stack() 2023-01-11T22:51:01.0815117Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0815406Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0815538Z File "", line 1, in 2023-01-11T22:51:01.0815730Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0815873Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0816072Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0816223Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0816434Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0816726Z self.run() 2023-01-11T22:51:01.0816948Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0817093Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0817426Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0817563Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0817995Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0818128Z getattr(self, test_name)() 2023-01-11T22:51:01.0818490Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0818588Z fn() 2023-01-11T22:51:01.0818949Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0819070Z test(self, **param_kwargs) 2023-01-11T22:51:01.0819408Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0819532Z return func(*args, **kwargs) 2023-01-11T22:51:01.0819767Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0819884Z self.run_subtests( 2023-01-11T22:51:01.0820236Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0820399Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0820757Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0820905Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0821261Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0821380Z output = model(*input) 2023-01-11T22:51:01.0821701Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0821836Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0822207Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0822383Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0822749Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0822870Z _lazy_init(state, module) 2023-01-11T22:51:01.0823204Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0823369Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0823763Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0823905Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0824239Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0824439Z return func(*args, **kwargs) 2023-01-11T22:51:01.0824817Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0824924Z p_assert( 2023-01-11T22:51:01.0825242Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0825367Z traceback.print_stack() 2023-01-11T22:51:01.0825494Z File "", line 1, in 2023-01-11T22:51:01.0825699Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0825840Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0826038Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0826187Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0826377Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0826484Z self.run() 2023-01-11T22:51:01.0826682Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0826828Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0827211Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0827350Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0827710Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0827832Z getattr(self, test_name)() 2023-01-11T22:51:01.0828169Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0828268Z fn() 2023-01-11T22:51:01.0828629Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0828750Z test(self, **param_kwargs) 2023-01-11T22:51:01.0829108Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0829231Z return func(*args, **kwargs) 2023-01-11T22:51:01.0829468Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0829579Z self.run_subtests( 2023-01-11T22:51:01.0829912Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0830071Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0830430Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0830579Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0830948Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0831067Z output = model(*input) 2023-01-11T22:51:01.0831391Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0831528Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0831886Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0832058Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0832418Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0832538Z _lazy_init(state, module) 2023-01-11T22:51:01.0832885Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0833053Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0833444Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0833640Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0833964Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0834091Z return func(*args, **kwargs) 2023-01-11T22:51:01.0834464Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0834564Z p_assert( 2023-01-11T22:51:01.0834899Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0835023Z traceback.print_stack() 2023-01-11T22:51:01.0835768Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0836572Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0837317Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0838052Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0838789Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0839514Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0840238Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0840970Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0841206Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0841440Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0841553Z File "", line 1, in 2023-01-11T22:51:01.0841763Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0841906Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0842107Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0842256Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0842522Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0842627Z self.run() 2023-01-11T22:51:01.0842813Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0842959Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0843302Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0843435Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0843793Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0843916Z getattr(self, test_name)() 2023-01-11T22:51:01.0844269Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0844364Z fn() 2023-01-11T22:51:01.0844707Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0844832Z test(self, **param_kwargs) 2023-01-11T22:51:01.0845244Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0845376Z return func(*args, **kwargs) 2023-01-11T22:51:01.0845615Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0845728Z self.run_subtests( 2023-01-11T22:51:01.0846080Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0846240Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0846582Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0846734Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0847105Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0847226Z output = model(*input) 2023-01-11T22:51:01.0847552Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0847689Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0848061Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0848238Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0848585Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0848701Z _lazy_init(state, module) 2023-01-11T22:51:01.0849052Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0849217Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0849617Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0849759Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0850094Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0850216Z return func(*args, **kwargs) 2023-01-11T22:51:01.0850572Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0850672Z p_assert( 2023-01-11T22:51:01.0851005Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0851129Z traceback.print_stack() 2023-01-11T22:51:01.0851256Z File "", line 1, in 2023-01-11T22:51:01.0851463Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0851661Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0851860Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0851997Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0852209Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0852311Z self.run() 2023-01-11T22:51:01.0852510Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0852654Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0852992Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0853123Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0853464Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0853587Z getattr(self, test_name)() 2023-01-11T22:51:01.0853951Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0854049Z fn() 2023-01-11T22:51:01.0854459Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0854585Z test(self, **param_kwargs) 2023-01-11T22:51:01.0854941Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0855063Z return func(*args, **kwargs) 2023-01-11T22:51:01.0855283Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0855395Z self.run_subtests( 2023-01-11T22:51:01.0855743Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0855901Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0856264Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0856415Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0857034Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0857157Z output = model(*input) 2023-01-11T22:51:01.0857469Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0857606Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0857981Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0858154Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0858519Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0858639Z _lazy_init(state, module) 2023-01-11T22:51:01.0858989Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0859155Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0859530Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0859673Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0860005Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0860132Z return func(*args, **kwargs) 2023-01-11T22:51:01.0860505Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0860607Z p_assert( 2023-01-11T22:51:01.0860937Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0861150Z traceback.print_stack() 2023-01-11T22:51:01.0861369Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0861608Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0861737Z File "", line 1, in 2023-01-11T22:51:01.0861944Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0862083Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0862285Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0862435Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0862644Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0862730Z self.run() 2023-01-11T22:51:01.0862930Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0863076Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0863416Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0863615Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0863986Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0864108Z getattr(self, test_name)() 2023-01-11T22:51:01.0864465Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0864545Z fn() 2023-01-11T22:51:01.0864904Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0865024Z test(self, **param_kwargs) 2023-01-11T22:51:01.0865377Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0865506Z return func(*args, **kwargs) 2023-01-11T22:51:01.0865743Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0865859Z self.run_subtests( 2023-01-11T22:51:01.0866194Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0866353Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0866711Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0866863Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0867233Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0867353Z output = model(*input) 2023-01-11T22:51:01.0867677Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0867816Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0868192Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0868349Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0868711Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0868829Z _lazy_init(state, module) 2023-01-11T22:51:01.0869177Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0869342Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0869734Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0869873Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0870274Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0870383Z return func(*args, **kwargs) 2023-01-11T22:51:01.0870761Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0870863Z p_assert( 2023-01-11T22:51:01.0871196Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0871322Z traceback.print_stack() 2023-01-11T22:51:01.0871449Z File "", line 1, in 2023-01-11T22:51:01.0871654Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0871778Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0871975Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0872123Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0872332Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0872434Z self.run() 2023-01-11T22:51:01.0872682Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0872834Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0873173Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0873287Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0873649Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0873772Z getattr(self, test_name)() 2023-01-11T22:51:01.0874123Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0874220Z fn() 2023-01-11T22:51:01.0874579Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0874705Z test(self, **param_kwargs) 2023-01-11T22:51:01.0875058Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0875167Z return func(*args, **kwargs) 2023-01-11T22:51:01.0875403Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0875514Z self.run_subtests( 2023-01-11T22:51:01.0875859Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0876021Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0876379Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0876527Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0876895Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0877003Z output = model(*input) 2023-01-11T22:51:01.0877332Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0877469Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0877841Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0878013Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0878375Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0878493Z _lazy_init(state, module) 2023-01-11T22:51:01.0878841Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0878991Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0879449Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0879593Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0879927Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0880051Z return func(*args, **kwargs) 2023-01-11T22:51:01.0880421Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0880521Z p_assert( 2023-01-11T22:51:01.0880849Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0880957Z traceback.print_stack() 2023-01-11T22:51:01.0881192Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0881425Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0882220Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0882972Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0883706Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0884446Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0885174Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0885900Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0886630Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0887358Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0888080Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0888861Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0889588Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0890309Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0890545Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0890822Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0891059Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0891287Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0891497Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0891724Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0892452Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0893187Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0893972Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0894700Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0894934Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0895167Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0895393Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0895618Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0896343Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0897312Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0898138Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0898859Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0899090Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0899320Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0899536Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0899823Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0900058Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0900282Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0901011Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0901736Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0902467Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0903193Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0903425Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0903659Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0903884Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0904112Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0904320Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0904545Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0904767Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0904991Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0905722Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0906511Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0907238Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0907965Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0908199Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0908515Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0908748Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0908976Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0909183Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0909407Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0910135Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0910863Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0911593Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0912323Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0912556Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0912787Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0913016Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0913243Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0913970Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0914693Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0915479Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0916207Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0916437Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0916669Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0916921Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0917159Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0917382Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0917608Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0918340Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0919068Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0919799Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0920522Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0920753Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0920988Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0921215Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0921435Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0921643Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0921867Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0921978Z dist init r=0, world=2 2023-01-11T22:51:01.0922303Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.0922618Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.0922978Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.0923284Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.0923584Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.0923885Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.0924179Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.0924463Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.0924805Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.0925111Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.0925410Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.0925709Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.0925817Z dist init r=1, world=2 2023-01-11T22:51:01.0926143Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.0926462Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.0926766Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.0927068Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.0927367Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.0927669Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.0927957Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.0928255Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.0928553Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.0928856Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.0929153Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.0929503Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.0929609Z ok (10.821s) 2023-01-11T22:51:01.0929935Z test_transformer_offload_true_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 96269 2023-01-11T22:51:01.0930150Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 96270 2023-01-11T22:51:01.0930529Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:01.0930686Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:01.0931065Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:01.0931252Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:01.0931622Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 56 slow tests 2023-01-11T22:51:01.0931841Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:51:01.0932223Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:51:01.0932411Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:51:01.0932656Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:51:01.0932899Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:51:01.0933279Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:01.0933672Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:51:01.0933904Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:51:01.0934130Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:51:01.0934364Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0934593Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0935606Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:01.0935722Z warnings.warn( 2023-01-11T22:51:01.0936917Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:51:01.0937037Z warnings.warn( 2023-01-11T22:51:01.0937169Z File "", line 1, in 2023-01-11T22:51:01.0937365Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0937505Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0937706Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0937849Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0938157Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0938258Z self.run() 2023-01-11T22:51:01.0938457Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0938591Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0938944Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0939078Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0939439Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0939564Z getattr(self, test_name)() 2023-01-11T22:51:01.0939921Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0940018Z fn() 2023-01-11T22:51:01.0940381Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0940490Z test(self, **param_kwargs) 2023-01-11T22:51:01.0940845Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0941033Z return func(*args, **kwargs) 2023-01-11T22:51:01.0941279Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0941392Z self.run_subtests( 2023-01-11T22:51:01.0941746Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0941908Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0942269Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0942406Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0942775Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0942899Z output = model(*input) 2023-01-11T22:51:01.0943224Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0943363Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0943736Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0943910Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0944275Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0944380Z _lazy_init(state, module) 2023-01-11T22:51:01.0944731Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0944897Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0945295Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0945435Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0945773Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0945897Z return func(*args, **kwargs) 2023-01-11T22:51:01.0946269Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0946354Z p_assert( 2023-01-11T22:51:01.0946686Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0946810Z traceback.print_stack() 2023-01-11T22:51:01.0946935Z File "", line 1, in 2023-01-11T22:51:01.0947143Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0947343Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0947546Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0947695Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0947891Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0947996Z self.run() 2023-01-11T22:51:01.0948196Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0948341Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0948680Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0948810Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0949169Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0949275Z getattr(self, test_name)() 2023-01-11T22:51:01.0949631Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0949732Z fn() 2023-01-11T22:51:01.0950138Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0950265Z test(self, **param_kwargs) 2023-01-11T22:51:01.0950623Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0950746Z return func(*args, **kwargs) 2023-01-11T22:51:01.0950983Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0951079Z self.run_subtests( 2023-01-11T22:51:01.0951431Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0951591Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0951951Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0952105Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0952479Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0952597Z output = model(*input) 2023-01-11T22:51:01.0952924Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0953044Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0953418Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0953590Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0953955Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0954074Z _lazy_init(state, module) 2023-01-11T22:51:01.0954431Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0954601Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0954996Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0955137Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0955456Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0955580Z return func(*args, **kwargs) 2023-01-11T22:51:01.0955952Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0956053Z p_assert( 2023-01-11T22:51:01.0956384Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0956561Z traceback.print_stack() 2023-01-11T22:51:01.0956796Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0957015Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0957763Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0958502Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0959283Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0960034Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0960766Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0961496Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0962223Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0962945Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0963667Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0964405Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.0964538Z File "", line 1, in 2023-01-11T22:51:01.0964731Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0964871Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0965072Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0965222Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0965493Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0965596Z self.run() 2023-01-11T22:51:01.0965801Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0965948Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0966273Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0966404Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0966770Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0966894Z getattr(self, test_name)() 2023-01-11T22:51:01.0967250Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0967349Z fn() 2023-01-11T22:51:01.0967715Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0967841Z test(self, **param_kwargs) 2023-01-11T22:51:01.0968239Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0968373Z return func(*args, **kwargs) 2023-01-11T22:51:01.0968608Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0968724Z self.run_subtests( 2023-01-11T22:51:01.0969076Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0969236Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0969596Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0969745Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0970105Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0970226Z output = model(*input) 2023-01-11T22:51:01.0970554Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0970692Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0971069Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0971240Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0971604Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0971725Z _lazy_init(state, module) 2023-01-11T22:51:01.0972060Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0972227Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0972621Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0972765Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0973101Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0973227Z return func(*args, **kwargs) 2023-01-11T22:51:01.0973604Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0973702Z p_assert( 2023-01-11T22:51:01.0974019Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0974145Z traceback.print_stack() 2023-01-11T22:51:01.0974274Z File "", line 1, in 2023-01-11T22:51:01.0974479Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0974678Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0974879Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0975031Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0975224Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0975328Z self.run() 2023-01-11T22:51:01.0975527Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0975672Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0976014Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0976146Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0976505Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0976846Z getattr(self, test_name)() 2023-01-11T22:51:01.0977203Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0977301Z fn() 2023-01-11T22:51:01.0977733Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0977864Z test(self, **param_kwargs) 2023-01-11T22:51:01.0978217Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0978342Z return func(*args, **kwargs) 2023-01-11T22:51:01.0978581Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0978695Z self.run_subtests( 2023-01-11T22:51:01.0979025Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0979186Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0979553Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0979704Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0980079Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0980195Z output = model(*input) 2023-01-11T22:51:01.0980520Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0980656Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0981065Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0981244Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0981612Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0981735Z _lazy_init(state, module) 2023-01-11T22:51:01.0982085Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0982256Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0982654Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0982795Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0983113Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0983237Z return func(*args, **kwargs) 2023-01-11T22:51:01.0983610Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0983712Z p_assert( 2023-01-11T22:51:01.0984046Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0984249Z traceback.print_stack() 2023-01-11T22:51:01.0984485Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0984722Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.0984834Z File "", line 1, in 2023-01-11T22:51:01.0985043Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0985183Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0985382Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0985529Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0985741Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0985846Z self.run() 2023-01-11T22:51:01.0986046Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0986177Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0986565Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0986706Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0987067Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0987189Z getattr(self, test_name)() 2023-01-11T22:51:01.0987547Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0987646Z fn() 2023-01-11T22:51:01.0987989Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0988114Z test(self, **param_kwargs) 2023-01-11T22:51:01.0988469Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0988598Z return func(*args, **kwargs) 2023-01-11T22:51:01.0988833Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0988952Z self.run_subtests( 2023-01-11T22:51:01.0989302Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0989466Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0989808Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0989958Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.0990331Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.0990447Z output = model(*input) 2023-01-11T22:51:01.0990769Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.0990910Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.0991287Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.0991458Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.0991804Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.0991924Z _lazy_init(state, module) 2023-01-11T22:51:01.0992269Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.0992432Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.0992826Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.0992966Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.0993408Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.0993538Z return func(*args, **kwargs) 2023-01-11T22:51:01.0993654Z File "", line 1, in 2023-01-11T22:51:01.0994035Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.0994138Z p_assert( 2023-01-11T22:51:01.0994469Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.0994596Z traceback.print_stack() 2023-01-11T22:51:01.0994801Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.0994942Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.0995139Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.0995271Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.0995482Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.0995585Z self.run() 2023-01-11T22:51:01.0995834Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.0995986Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.0996321Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.0996451Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.0996807Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.0996912Z getattr(self, test_name)() 2023-01-11T22:51:01.0997266Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.0997359Z fn() 2023-01-11T22:51:01.0997720Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.0997844Z test(self, **param_kwargs) 2023-01-11T22:51:01.0998202Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.0998326Z return func(*args, **kwargs) 2023-01-11T22:51:01.0998564Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.0998659Z self.run_subtests( 2023-01-11T22:51:01.0999011Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.0999171Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.0999530Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.0999681Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.1000056Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.1000174Z output = model(*input) 2023-01-11T22:51:01.1000496Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.1000615Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.1000989Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.1001159Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.1001523Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.1001642Z _lazy_init(state, module) 2023-01-11T22:51:01.1001992Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.1002157Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.1002611Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.1002739Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.1003072Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.1003198Z return func(*args, **kwargs) 2023-01-11T22:51:01.1003571Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.1003674Z p_assert( 2023-01-11T22:51:01.1004006Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.1004133Z traceback.print_stack() 2023-01-11T22:51:01.1004368Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1004589Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1004719Z File "", line 1, in 2023-01-11T22:51:01.1004974Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.1005124Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.1005325Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.1005474Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.1005681Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.1005767Z self.run() 2023-01-11T22:51:01.1005966Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.1006110Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.1006453Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.1006592Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.1006946Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.1007072Z getattr(self, test_name)() 2023-01-11T22:51:01.1007427Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.1007508Z fn() 2023-01-11T22:51:01.1007863Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.1007983Z test(self, **param_kwargs) 2023-01-11T22:51:01.1008336Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.1008461Z return func(*args, **kwargs) 2023-01-11T22:51:01.1008693Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.1008805Z self.run_subtests( 2023-01-11T22:51:01.1009158Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.1009305Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.1009662Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.1009819Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.1010191Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.1010310Z output = model(*input) 2023-01-11T22:51:01.1010628Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.1010765Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.1011137Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.1011360Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.1011729Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.1011849Z _lazy_init(state, module) 2023-01-11T22:51:01.1012201Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.1012368Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.1012763Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.1012905Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.1013242Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.1013350Z return func(*args, **kwargs) 2023-01-11T22:51:01.1013724Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.1013831Z p_assert( 2023-01-11T22:51:01.1014210Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.1014342Z traceback.print_stack() 2023-01-11T22:51:01.1014471Z File "", line 1, in 2023-01-11T22:51:01.1014682Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.1014822Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.1015004Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.1015151Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.1015362Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.1015465Z self.run() 2023-01-11T22:51:01.1015665Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.1015812Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.1016150Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.1016287Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.1016964Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.1017095Z getattr(self, test_name)() 2023-01-11T22:51:01.1017460Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.1017557Z fn() 2023-01-11T22:51:01.1017918Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.1018041Z test(self, **param_kwargs) 2023-01-11T22:51:01.1018394Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.1018505Z return func(*args, **kwargs) 2023-01-11T22:51:01.1018740Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.1018857Z self.run_subtests( 2023-01-11T22:51:01.1019209Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.1019371Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.1019728Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.1019877Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.1020247Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.1020348Z output = model(*input) 2023-01-11T22:51:01.1020671Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.1020897Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.1021276Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.1021448Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.1021813Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.1021932Z _lazy_init(state, module) 2023-01-11T22:51:01.1022281Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.1022446Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.1022822Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.1022966Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.1023308Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.1023433Z return func(*args, **kwargs) 2023-01-11T22:51:01.1023868Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.1023977Z p_assert( 2023-01-11T22:51:01.1024315Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.1024439Z traceback.print_stack() 2023-01-11T22:51:01.1025171Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1025918Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1026661Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1027393Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1028126Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1028859Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1029584Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1030315Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1030608Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1030841Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1030972Z File "", line 1, in 2023-01-11T22:51:01.1031182Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.1031324Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.1031525Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.1031675Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.1031870Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.1031975Z self.run() 2023-01-11T22:51:01.1032176Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.1032380Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.1032731Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.1032863Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.1033224Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.1033330Z getattr(self, test_name)() 2023-01-11T22:51:01.1033688Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.1033786Z fn() 2023-01-11T22:51:01.1034148Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.1034269Z test(self, **param_kwargs) 2023-01-11T22:51:01.1034626Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.1034751Z return func(*args, **kwargs) 2023-01-11T22:51:01.1034991Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.1035088Z self.run_subtests( 2023-01-11T22:51:01.1035435Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.1035597Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.1035959Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.1036111Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.1036484Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.1036606Z output = model(*input) 2023-01-11T22:51:01.1036927Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.1037050Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.1037424Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.1037596Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.1037959Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.1038077Z _lazy_init(state, module) 2023-01-11T22:51:01.1038425Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.1038591Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.1038985Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.1039183Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.1039510Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.1039635Z return func(*args, **kwargs) 2023-01-11T22:51:01.1040013Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.1040116Z p_assert( 2023-01-11T22:51:01.1040450Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.1040576Z traceback.print_stack() 2023-01-11T22:51:01.1040705Z File "", line 1, in 2023-01-11T22:51:01.1040894Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.1041036Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.1041242Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.1041392Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.1041647Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.1041756Z self.run() 2023-01-11T22:51:01.1041958Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.1042103Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.1042426Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.1042560Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.1042919Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.1043043Z getattr(self, test_name)() 2023-01-11T22:51:01.1043397Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.1043498Z fn() 2023-01-11T22:51:01.1043858Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.1043982Z test(self, **param_kwargs) 2023-01-11T22:51:01.1044319Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.1044445Z return func(*args, **kwargs) 2023-01-11T22:51:01.1044680Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.1044794Z self.run_subtests( 2023-01-11T22:51:01.1045142Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.1045304Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.1045668Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.1045825Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.1046182Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.1046301Z output = model(*input) 2023-01-11T22:51:01.1046624Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.1046761Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.1047132Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.1047305Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.1047669Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.1047790Z _lazy_init(state, module) 2023-01-11T22:51:01.1048123Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.1048376Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.1048779Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.1048919Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.1049258Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.1049384Z return func(*args, **kwargs) 2023-01-11T22:51:01.1049757Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.1049858Z p_assert( 2023-01-11T22:51:01.1050175Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.1050301Z traceback.print_stack() 2023-01-11T22:51:01.1050537Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1050766Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1050939Z File "", line 1, in 2023-01-11T22:51:01.1051154Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.1051295Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.1051495Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.1051627Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.1051837Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.1051940Z self.run() 2023-01-11T22:51:01.1052141Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.1052287Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.1052632Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.1052764Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.1053108Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.1053232Z getattr(self, test_name)() 2023-01-11T22:51:01.1053590Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.1053685Z fn() 2023-01-11T22:51:01.1054041Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.1054164Z test(self, **param_kwargs) 2023-01-11T22:51:01.1054516Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.1054637Z return func(*args, **kwargs) 2023-01-11T22:51:01.1054860Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.1054972Z self.run_subtests( 2023-01-11T22:51:01.1055324Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.1055482Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.1055843Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.1055993Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.1056363Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.1056480Z output = model(*input) 2023-01-11T22:51:01.1057065Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.1057206Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.1057677Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.1057852Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.1058217Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.1058339Z _lazy_init(state, module) 2023-01-11T22:51:01.1058688Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.1058855Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.1059233Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.1059372Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.1059708Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.1059833Z return func(*args, **kwargs) 2023-01-11T22:51:01.1060265Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.1060375Z p_assert( 2023-01-11T22:51:01.1060713Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.1060837Z traceback.print_stack() 2023-01-11T22:51:01.1060948Z File "", line 1, in 2023-01-11T22:51:01.1061156Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.1061298Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.1061498Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.1061646Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.1061855Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.1061962Z self.run() 2023-01-11T22:51:01.1062165Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.1062296Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.1062640Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.1062768Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.1063129Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.1063252Z getattr(self, test_name)() 2023-01-11T22:51:01.1063608Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.1063706Z fn() 2023-01-11T22:51:01.1064049Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.1064169Z test(self, **param_kwargs) 2023-01-11T22:51:01.1064527Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.1064651Z return func(*args, **kwargs) 2023-01-11T22:51:01.1064888Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.1065003Z self.run_subtests( 2023-01-11T22:51:01.1065353Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.1065513Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.1065852Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.1066001Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.1066372Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.1066546Z output = model(*input) 2023-01-11T22:51:01.1066871Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.1067012Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.1067384Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.1067553Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.1067899Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.1068019Z _lazy_init(state, module) 2023-01-11T22:51:01.1068364Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.1068531Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.1068925Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.1069072Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.1069459Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.1069590Z return func(*args, **kwargs) 2023-01-11T22:51:01.1069973Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.1070058Z p_assert( 2023-01-11T22:51:01.1070391Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.1070516Z traceback.print_stack() 2023-01-11T22:51:01.1070750Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1070984Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1071114Z File "", line 1, in 2023-01-11T22:51:01.1071243Z File "", line 1, in 2023-01-11T22:51:01.1071433Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.1071577Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.1071776Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.1071926Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.1072129Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.1072268Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.1072476Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.1072582Z self.run() 2023-01-11T22:51:01.1072762Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.1072909Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.1073113Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.1073257Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.1073467Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.1073571Z self.run() 2023-01-11T22:51:01.1073913Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.1074028Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.1074225Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.1074369Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.1074729Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.1074854Z getattr(self, test_name)() 2023-01-11T22:51:01.1075186Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.1075368Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.1075727Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.1075811Z fn() 2023-01-11T22:51:01.1076170Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.1076294Z getattr(self, test_name)() 2023-01-11T22:51:01.1076654Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.1076773Z test(self, **param_kwargs) 2023-01-11T22:51:01.1077122Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.1077215Z fn() 2023-01-11T22:51:01.1077569Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.1077680Z return func(*args, **kwargs) 2023-01-11T22:51:01.1078036Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.1078200Z test(self, **param_kwargs) 2023-01-11T22:51:01.1078439Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.1078552Z self.run_subtests( 2023-01-11T22:51:01.1078908Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.1079033Z return func(*args, **kwargs) 2023-01-11T22:51:01.1079378Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.1079522Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.1079759Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.1079876Z self.run_subtests( 2023-01-11T22:51:01.1080241Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.1080394Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.1080741Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.1080900Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.1081274Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.1081376Z output = model(*input) 2023-01-11T22:51:01.1081729Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.1081879Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.1082199Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.1082341Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.1082715Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.1082835Z output = model(*input) 2023-01-11T22:51:01.1083207Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.1083364Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.1083686Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.1083824Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.1084184Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.1084304Z _lazy_init(state, module) 2023-01-11T22:51:01.1084676Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.1084902Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.1085258Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.1085408Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.1085773Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.1085895Z _lazy_init(state, module) 2023-01-11T22:51:01.1086290Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.1086431Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.1086777Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.1086945Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.1087280Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.1087433Z return func(*args, **kwargs) 2023-01-11T22:51:01.1087838Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.1087979Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.1088356Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.1088457Z p_assert( 2023-01-11T22:51:01.1088792Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.1088917Z return func(*args, **kwargs) 2023-01-11T22:51:01.1089250Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.1089363Z traceback.print_stack() 2023-01-11T22:51:01.1089742Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.1089844Z p_assert( 2023-01-11T22:51:01.1090176Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.1090298Z traceback.print_stack() 2023-01-11T22:51:01.1091039Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1091773Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1092516Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1093251Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1094035Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1094833Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1095556Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1096284Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1097377Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1098128Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1098852Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1099581Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1100304Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1101026Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1101754Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1102475Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1102712Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1102947Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1103128Z File "", line 1, in 2023-01-11T22:51:01.1103338Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.1103486Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.1103689Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.1103837Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.1104045Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.1104148Z self.run() 2023-01-11T22:51:01.1104331Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.1104475Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.1104818Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.1104950Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.1105316Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.1105442Z getattr(self, test_name)() 2023-01-11T22:51:01.1105848Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.1105950Z fn() 2023-01-11T22:51:01.1106298Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.1106421Z test(self, **param_kwargs) 2023-01-11T22:51:01.1106778Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.1106900Z return func(*args, **kwargs) 2023-01-11T22:51:01.1107136Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.1107246Z self.run_subtests( 2023-01-11T22:51:01.1107592Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.1107755Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.1108101Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.1108254Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.1108627Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.1108747Z output = model(*input) 2023-01-11T22:51:01.1109067Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.1109202Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.1109572Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.1109745Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.1110098Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.1110224Z _lazy_init(state, module) 2023-01-11T22:51:01.1110575Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.1110738Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.1111135Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.1111276Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.1111613Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.1111736Z return func(*args, **kwargs) 2023-01-11T22:51:01.1112113Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.1112262Z p_assert( 2023-01-11T22:51:01.1112600Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.1112729Z traceback.print_stack() 2023-01-11T22:51:01.1112858Z File "", line 1, in 2023-01-11T22:51:01.1113065Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.1113207Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.1113406Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.1113540Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.1113746Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.1113851Z self.run() 2023-01-11T22:51:01.1114052Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.1114197Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.1114539Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.1114674Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.1115081Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.1115194Z getattr(self, test_name)() 2023-01-11T22:51:01.1115555Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.1115653Z fn() 2023-01-11T22:51:01.1116013Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.1116134Z test(self, **param_kwargs) 2023-01-11T22:51:01.1116486Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.1116609Z return func(*args, **kwargs) 2023-01-11T22:51:01.1116851Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.1116946Z self.run_subtests( 2023-01-11T22:51:01.1117301Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.1117462Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.1117822Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.1117973Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.1118345Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.1118459Z output = model(*input) 2023-01-11T22:51:01.1118784Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.1118907Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.1119279Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.1119455Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.1119822Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.1119942Z _lazy_init(state, module) 2023-01-11T22:51:01.1120293Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.1120458Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.1120852Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.1120977Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.1121313Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.1121488Z return func(*args, **kwargs) 2023-01-11T22:51:01.1121868Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.1121968Z p_assert( 2023-01-11T22:51:01.1122305Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.1122431Z traceback.print_stack() 2023-01-11T22:51:01.1122665Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1122882Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1123007Z File "", line 1, in 2023-01-11T22:51:01.1123215Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.1123355Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.1123554Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.1123705Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.1123969Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.1124062Z self.run() 2023-01-11T22:51:01.1124261Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.1124405Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.1124744Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.1124875Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.1125230Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.1125353Z getattr(self, test_name)() 2023-01-11T22:51:01.1125708Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.1125793Z fn() 2023-01-11T22:51:01.1126157Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.1126280Z test(self, **param_kwargs) 2023-01-11T22:51:01.1126636Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.1126757Z return func(*args, **kwargs) 2023-01-11T22:51:01.1126991Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.1127105Z self.run_subtests( 2023-01-11T22:51:01.1127452Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.1127597Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.1127956Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.1128107Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.1128487Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.1128606Z output = model(*input) 2023-01-11T22:51:01.1128930Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.1129066Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.1129439Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.1129593Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.1129959Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.1130078Z _lazy_init(state, module) 2023-01-11T22:51:01.1130491Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.1130656Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.1131055Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.1131197Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.1131535Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.1131642Z return func(*args, **kwargs) 2023-01-11T22:51:01.1132018Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.1132121Z p_assert( 2023-01-11T22:51:01.1132456Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.1132581Z traceback.print_stack() 2023-01-11T22:51:01.1132712Z File "", line 1, in 2023-01-11T22:51:01.1132919Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.1133107Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.1133296Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.1133446Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.1133656Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.1133758Z self.run() 2023-01-11T22:51:01.1133957Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.1134114Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.1134453Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.1134584Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.1134923Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.1135051Z getattr(self, test_name)() 2023-01-11T22:51:01.1135409Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.1135505Z fn() 2023-01-11T22:51:01.1135870Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.1135993Z test(self, **param_kwargs) 2023-01-11T22:51:01.1136344Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.1136450Z return func(*args, **kwargs) 2023-01-11T22:51:01.1136964Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.1137086Z self.run_subtests( 2023-01-11T22:51:01.1137442Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.1137606Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.1137969Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.1138118Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.1138488Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.1138589Z output = model(*input) 2023-01-11T22:51:01.1138913Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.1139049Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.1139426Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.1145820Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.1146411Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.1146519Z _lazy_init(state, module) 2023-01-11T22:51:01.1146888Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.1147056Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.1147456Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.1147597Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.1147934Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.1148055Z return func(*args, **kwargs) 2023-01-11T22:51:01.1148424Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.1148514Z p_assert( 2023-01-11T22:51:01.1148845Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.1149038Z traceback.print_stack() 2023-01-11T22:51:01.1149287Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1149520Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1149650Z File "", line 1, in 2023-01-11T22:51:01.1149861Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.1150001Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.1150184Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.1150333Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.1150543Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.1150649Z self.run() 2023-01-11T22:51:01.1150851Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.1150997Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.1151339Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.1151455Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.1151810Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.1151933Z getattr(self, test_name)() 2023-01-11T22:51:01.1152289Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.1152385Z fn() 2023-01-11T22:51:01.1152746Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.1152871Z test(self, **param_kwargs) 2023-01-11T22:51:01.1153227Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.1153339Z return func(*args, **kwargs) 2023-01-11T22:51:01.1153576Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.1153687Z self.run_subtests( 2023-01-11T22:51:01.1154038Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.1154195Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.1154557Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.1154709Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.1155080Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.1155241Z output = model(*input) 2023-01-11T22:51:01.1155568Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.1155708Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.1156086Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.1156261Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.1156627Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.1156748Z _lazy_init(state, module) 2023-01-11T22:51:01.1157097Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.1157247Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.1157642Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.1157787Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.1158171Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.1158300Z return func(*args, **kwargs) 2023-01-11T22:51:01.1158676Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.1158781Z p_assert( 2023-01-11T22:51:01.1159116Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.1159225Z traceback.print_stack() 2023-01-11T22:51:01.1159354Z File "", line 1, in 2023-01-11T22:51:01.1159560Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.1159700Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.1159905Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.1160054Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.1160267Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.1160369Z self.run() 2023-01-11T22:51:01.1160551Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.1160695Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.1161033Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.1161165Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.1161523Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.1161645Z getattr(self, test_name)() 2023-01-11T22:51:01.1161998Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.1162100Z fn() 2023-01-11T22:51:01.1162444Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.1162568Z test(self, **param_kwargs) 2023-01-11T22:51:01.1162925Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.1163047Z return func(*args, **kwargs) 2023-01-11T22:51:01.1163283Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.1163395Z self.run_subtests( 2023-01-11T22:51:01.1163744Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.1163905Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.1164247Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.1164455Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.1164834Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.1164953Z output = model(*input) 2023-01-11T22:51:01.1165276Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.1165411Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.1165785Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.1165959Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.1166307Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.1166427Z _lazy_init(state, module) 2023-01-11T22:51:01.1166783Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.1166948Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.1167403Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.1167552Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.1167894Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.1168019Z return func(*args, **kwargs) 2023-01-11T22:51:01.1168376Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.1168477Z p_assert( 2023-01-11T22:51:01.1168809Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.1168934Z traceback.print_stack() 2023-01-11T22:51:01.1169689Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1170428Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1171168Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1171910Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1172638Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1173350Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1174138Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1174865Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1175101Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1175335Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1175466Z File "", line 1, in 2023-01-11T22:51:01.1175576Z File "", line 1, in 2023-01-11T22:51:01.1175792Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.1175933Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.1176181Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.1176341Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.1176795Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.1176956Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.1177170Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.1177259Z self.run() 2023-01-11T22:51:01.1177457Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.1177607Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.1177808Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.1177959Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.1178169Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.1178276Z self.run() 2023-01-11T22:51:01.1178618Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.1178753Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.1178953Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.1179096Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.1179457Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.1179580Z getattr(self, test_name)() 2023-01-11T22:51:01.1179912Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.1180042Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.1180386Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.1180487Z fn() 2023-01-11T22:51:01.1180846Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.1180969Z getattr(self, test_name)() 2023-01-11T22:51:01.1181331Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.1181454Z test(self, **param_kwargs) 2023-01-11T22:51:01.1181806Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.1181904Z fn() 2023-01-11T22:51:01.1182241Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.1182365Z return func(*args, **kwargs) 2023-01-11T22:51:01.1182813Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.1182934Z test(self, **param_kwargs) 2023-01-11T22:51:01.1183174Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.1183284Z self.run_subtests( 2023-01-11T22:51:01.1183640Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.1183747Z return func(*args, **kwargs) 2023-01-11T22:51:01.1184093Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.1184253Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.1184489Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.1184600Z self.run_subtests( 2023-01-11T22:51:01.1184962Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.1185115Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.1185519Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.1185684Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.1186046Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.1186163Z output = model(*input) 2023-01-11T22:51:01.1186519Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.1186667Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.1186986Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.1187127Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.1187500Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.1187620Z output = model(*input) 2023-01-11T22:51:01.1187978Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.1188151Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.1188477Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.1188613Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.1188976Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.1189093Z _lazy_init(state, module) 2023-01-11T22:51:01.1189466Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.1189640Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.1189976Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.1190141Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.1190502Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.1190622Z _lazy_init(state, module) 2023-01-11T22:51:01.1191016Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.1191158Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.1191503Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.1191668Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.1192046Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.1192172Z return func(*args, **kwargs) 2023-01-11T22:51:01.1192569Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.1192708Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.1193082Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.1193183Z p_assert( 2023-01-11T22:51:01.1193577Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.1193701Z return func(*args, **kwargs) 2023-01-11T22:51:01.1194023Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.1194148Z traceback.print_stack() 2023-01-11T22:51:01.1194526Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.1194625Z p_assert( 2023-01-11T22:51:01.1195052Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.1195181Z traceback.print_stack() 2023-01-11T22:51:01.1195416Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1195647Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1195758Z File "", line 1, in 2023-01-11T22:51:01.1195967Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.1196108Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.1196304Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.1196457Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.1196667Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.1196770Z self.run() 2023-01-11T22:51:01.1196956Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.1197103Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.1197444Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.1197575Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.1197934Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.1198055Z getattr(self, test_name)() 2023-01-11T22:51:01.1198408Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.1198503Z fn() 2023-01-11T22:51:01.1198851Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.1198971Z test(self, **param_kwargs) 2023-01-11T22:51:01.1199324Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.1199446Z return func(*args, **kwargs) 2023-01-11T22:51:01.1199685Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.1199796Z self.run_subtests( 2023-01-11T22:51:01.1200143Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.1200301Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.1200642Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.1200797Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.1201224Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.1201343Z output = model(*input) 2023-01-11T22:51:01.1201670Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.1201807Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.1202179Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.1202352Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.1202699Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.1202820Z _lazy_init(state, module) 2023-01-11T22:51:01.1203169Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.1203339Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.1203777Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.1203925Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.1204261Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.1204384Z return func(*args, **kwargs) 2023-01-11T22:51:01.1204742Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.1204845Z p_assert( 2023-01-11T22:51:01.1205179Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.1205304Z traceback.print_stack() 2023-01-11T22:51:01.1205431Z File "", line 1, in 2023-01-11T22:51:01.1205638Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:51:01.1205781Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:51:01.1205982Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:51:01.1206115Z return self._bootstrap(parent_sentinel) 2023-01-11T22:51:01.1206325Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:51:01.1206426Z self.run() 2023-01-11T22:51:01.1206625Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:51:01.1206769Z self._target(*self._args, **self._kwargs) 2023-01-11T22:51:01.1207098Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:51:01.1207230Z self.run_test(test_name, pipe) 2023-01-11T22:51:01.1207587Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:51:01.1207696Z getattr(self, test_name)() 2023-01-11T22:51:01.1208048Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:51:01.1208141Z fn() 2023-01-11T22:51:01.1208505Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:51:01.1208628Z test(self, **param_kwargs) 2023-01-11T22:51:01.1208985Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:51:01.1209109Z return func(*args, **kwargs) 2023-01-11T22:51:01.1209327Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:51:01.1209441Z self.run_subtests( 2023-01-11T22:51:01.1209790Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:51:01.1209950Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:51:01.1210372Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:51:01.1210529Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:51:01.1210901Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:51:01.1211018Z output = model(*input) 2023-01-11T22:51:01.1211341Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:51:01.1211460Z return forward_call(*args, **kwargs) 2023-01-11T22:51:01.1211833Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:51:01.1212005Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:51:01.1212372Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:51:01.1212495Z _lazy_init(state, module) 2023-01-11T22:51:01.1212884Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:51:01.1213057Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:51:01.1213452Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:51:01.1213579Z handle.init_flat_param_attributes() 2023-01-11T22:51:01.1213911Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:51:01.1214032Z return func(*args, **kwargs) 2023-01-11T22:51:01.1214405Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:51:01.1214506Z p_assert( 2023-01-11T22:51:01.1214840Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:51:01.1214969Z traceback.print_stack() 2023-01-11T22:51:01.1215206Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1215423Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1216169Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1217227Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1217982Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1218716Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1219446Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1220273Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1221001Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1221727Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1222510Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1223246Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1223966Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1224700Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1224933Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1225169Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1225402Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1225631Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1225858Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1226081Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1226796Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1227522Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1228251Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1229040Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1229275Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1229507Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1229732Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1229957Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1230682Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1231464Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1232201Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1232924Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1233160Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1233389Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1233616Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1233842Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1234052Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1234278Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1235011Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1235739Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1236465Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1237189Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1237475Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1237702Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1237930Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1238157Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1238382Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1238605Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1238812Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1239036Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1239813Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1240545Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1241270Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1242001Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1242228Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1242454Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1242681Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1242903Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1243128Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1243356Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1244072Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1244796Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1245525Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1246312Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1246544Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1246770Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1246998Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1247224Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1247951Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1248719Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1249455Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1250180Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1250419Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1250652Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1250879Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1251105Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1251312Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1251534Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1252259Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1252989Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1253716Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1254442Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:51:01.1254731Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1254960Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1255186Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1255413Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1255636Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1255860Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:51:01.1255954Z dist init r=0, world=2 2023-01-11T22:51:01.1256278Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.1256890Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.1257222Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.1257525Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.1257825Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.1258122Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.1258423Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.1258725Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.1259023Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.1259329Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.1259612Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.1259911Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:51:01.1260025Z dist init r=1, world=2 2023-01-11T22:51:01.1260346Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.1260660Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.1260965Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.1261269Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.1261643Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.1261948Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.1262247Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.1262543Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.1262824Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.1263122Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.1263466Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.1263771Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:51:01.1263874Z ok (11.121s) 2023-01-11T22:51:01.1263899Z 2023-01-11T22:51:01.1264179Z ---------------------------------------------------------------------- 2023-01-11T22:51:01.1264297Z Ran 59 tests in 609.314s 2023-01-11T22:51:01.1264317Z 2023-01-11T22:51:01.1264424Z OK (skipped=5) 2023-01-11T22:51:01.1264443Z 2023-01-11T22:51:01.1264565Z Generating XML reports... 2023-01-11T22:51:01.1264969Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestHooks-20230111224050.xml 2023-01-11T22:51:01.1265354Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestNoGrad-20230111224050.xml 2023-01-11T22:51:01.1265758Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestParamInit-20230111224050.xml 2023-01-11T22:51:01.1266183Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestParityWithDDP-20230111224050.xml 2023-01-11T22:51:01.1266204Z 2023-01-11T22:51:01.1266668Z ##[endgroup] 2023-01-11T22:51:01.1267121Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_core (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_core_dgo9oq20) 2023-01-11T22:51:01.1267142Z 2023-01-11T22:51:01.1267198Z 2023-01-11T22:51:01.1267305Z real 97m53.272s 2023-01-11T22:51:01.1267409Z user 160m28.183s 2023-01-11T22:51:01.1267511Z sys 87m14.862s 2023-01-11T22:51:01.1267607Z + assert_git_not_dirty 2023-01-11T22:51:01.1267847Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 != *rocm* ]] 2023-01-11T22:51:01.1268080Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 != *xla* ]] 2023-01-11T22:51:01.1268236Z ++ git status --porcelain 2023-01-11T22:51:02.0304541Z + git_status= 2023-01-11T22:51:02.0304980Z + [[ -n '' ]] 2023-01-11T22:51:02.0305390Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 == *cuda* ]] 2023-01-11T22:51:02.0305690Z + [[ 3 == 1 ]] 2023-01-11T22:51:02.0305894Z + [[ 3 == 1 ]] 2023-01-11T22:51:02.0371663Z ##[group]Run cat test/**/*.log || true 2023-01-11T22:51:02.0371996Z cat test/**/*.log || true 2023-01-11T22:51:02.0386045Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T22:51:02.0386354Z env: 2023-01-11T22:51:02.0386597Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:51:02.0386848Z GPU_FLAG: --gpus all 2023-01-11T22:51:02.0387207Z DOCKER_CONTAINER_ID: 7c5487d9c02b51c7d3fc373594ecc7b146547c11845b5a7279d2f69d27bfcc5b 2023-01-11T22:51:02.0387553Z ##[endgroup] 2023-01-11T22:51:02.0433579Z cat: test/**/*.log: No such file or directory 2023-01-11T22:51:02.0465266Z Prepare all required actions 2023-01-11T22:51:02.0465695Z Getting action download info 2023-01-11T22:51:02.2261333Z ##[group]Run ./.github/actions/get-workflow-job-id 2023-01-11T22:51:02.2261628Z with: 2023-01-11T22:51:02.2262078Z github-token: *** 2023-01-11T22:51:02.2262294Z env: 2023-01-11T22:51:02.2262532Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:51:02.2262799Z GPU_FLAG: --gpus all 2023-01-11T22:51:02.2263133Z DOCKER_CONTAINER_ID: 7c5487d9c02b51c7d3fc373594ecc7b146547c11845b5a7279d2f69d27bfcc5b 2023-01-11T22:51:02.2263470Z ##[endgroup] 2023-01-11T22:51:02.2297430Z ##[group]Run nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482 2023-01-11T22:51:02.2297747Z with: 2023-01-11T22:51:02.2297970Z shell: bash 2023-01-11T22:51:02.2298193Z timeout_minutes: 10 2023-01-11T22:51:02.2298443Z max_attempts: 5 2023-01-11T22:51:02.2298693Z retry_wait_seconds: 30 2023-01-11T22:51:02.2299223Z command: set -eux python3 -m pip install requests==2.26.0 GHA_WORKFLOW_JOB_ID=$(python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}") echo "job-id=${GHA_WORKFLOW_JOB_ID}" >> "${GITHUB_OUTPUT}" 2023-01-11T22:51:02.2299736Z polling_interval_seconds: 1 2023-01-11T22:51:02.2299989Z warning_on_retry: true 2023-01-11T22:51:02.2300254Z continue_on_error: false 2023-01-11T22:51:02.2300496Z env: 2023-01-11T22:51:02.2300713Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:51:02.2300979Z GPU_FLAG: --gpus all 2023-01-11T22:51:02.2301333Z DOCKER_CONTAINER_ID: 7c5487d9c02b51c7d3fc373594ecc7b146547c11845b5a7279d2f69d27bfcc5b 2023-01-11T22:51:02.2301804Z GITHUB_TOKEN: *** 2023-01-11T22:51:02.2302049Z ##[endgroup] 2023-01-11T22:51:02.2972672Z + python3 -m pip install requests==2.26.0 2023-01-11T22:51:02.5860916Z Defaulting to user installation because normal site-packages is not writeable 2023-01-11T22:51:02.7265930Z Collecting requests==2.26.0 2023-01-11T22:51:02.7488181Z Downloading requests-2.26.0-py2.py3-none-any.whl (62 kB) 2023-01-11T22:51:02.8341071Z Collecting certifi>=2017.4.17 2023-01-11T22:51:02.8387097Z Downloading certifi-2022.12.7-py3-none-any.whl (155 kB) 2023-01-11T22:51:02.8891674Z Collecting idna<4,>=2.5; python_version >= "3" 2023-01-11T22:51:02.8938903Z Downloading idna-3.4-py3-none-any.whl (61 kB) 2023-01-11T22:51:03.0029025Z Collecting urllib3<1.27,>=1.21.1 2023-01-11T22:51:03.0075779Z Downloading urllib3-1.26.14-py2.py3-none-any.whl (140 kB) 2023-01-11T22:51:03.2071106Z Collecting charset-normalizer~=2.0.0; python_version >= "3" 2023-01-11T22:51:03.2117999Z Downloading charset_normalizer-2.0.12-py3-none-any.whl (39 kB) 2023-01-11T22:51:03.2998355Z Installing collected packages: certifi, idna, urllib3, charset-normalizer, requests 2023-01-11T22:51:03.5030951Z WARNING: The script normalizer is installed in '/home/ec2-user/.local/bin' which is not on PATH. 2023-01-11T22:51:03.5031602Z Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. 2023-01-11T22:51:03.5517487Z Successfully installed certifi-2022.12.7 charset-normalizer-2.0.12 idna-3.4 requests-2.26.0 urllib3-1.26.14 2023-01-11T22:51:03.6022580Z ++ python3 .github/scripts/get_workflow_job_id.py 3896099317 i-0f0fe094d8805bec6 2023-01-11T22:51:05.7430836Z + GHA_WORKFLOW_JOB_ID=10589292222 2023-01-11T22:51:05.7431573Z + echo job-id=10589292222 2023-01-11T22:51:06.2967240Z Command completed after 1 attempt(s). 2023-01-11T22:51:06.3104123Z ##[group]Run kill "$MONITOR_SCRIPT_PID" 2023-01-11T22:51:06.3104476Z kill "$MONITOR_SCRIPT_PID" 2023-01-11T22:51:06.3117872Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T22:51:06.3118175Z env: 2023-01-11T22:51:06.3118403Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:51:06.3118673Z GPU_FLAG: --gpus all 2023-01-11T22:51:06.3119031Z DOCKER_CONTAINER_ID: 7c5487d9c02b51c7d3fc373594ecc7b146547c11845b5a7279d2f69d27bfcc5b 2023-01-11T22:51:06.3119376Z MONITOR_SCRIPT_PID: 60099 2023-01-11T22:51:06.3119638Z ##[endgroup] 2023-01-11T22:51:06.3222554Z Prepare all required actions 2023-01-11T22:51:06.3223051Z Getting action download info 2023-01-11T22:51:06.4898101Z Download action repository 'actions/upload-artifact@v3' (SHA:0b7f8abb1508181956e8e162db84b466c27e18ce) 2023-01-11T22:51:06.6476764Z ##[group]Run ./.github/actions/upload-test-artifacts 2023-01-11T22:51:06.6477042Z with: 2023-01-11T22:51:06.6477394Z file-suffix: test-distributed-3-3-linux.8xlarge.nvidia.gpu_10589292222 2023-01-11T22:51:06.6477739Z env: 2023-01-11T22:51:06.6477961Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:51:06.6478223Z GPU_FLAG: --gpus all 2023-01-11T22:51:06.6478574Z DOCKER_CONTAINER_ID: 7c5487d9c02b51c7d3fc373594ecc7b146547c11845b5a7279d2f69d27bfcc5b 2023-01-11T22:51:06.6478888Z ##[endgroup] 2023-01-11T22:51:06.6507044Z ##[group]Run # Remove any previous test jsons if they exist 2023-01-11T22:51:06.6507400Z # Remove any previous test jsons if they exist 2023-01-11T22:51:06.6507714Z rm -f test-jsons-*.zip 2023-01-11T22:51:06.6508081Z zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' 2023-01-11T22:51:06.6520100Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T22:51:06.6520397Z env: 2023-01-11T22:51:06.6520643Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:51:06.6520890Z GPU_FLAG: --gpus all 2023-01-11T22:51:06.6521248Z DOCKER_CONTAINER_ID: 7c5487d9c02b51c7d3fc373594ecc7b146547c11845b5a7279d2f69d27bfcc5b 2023-01-11T22:51:06.6521709Z FILE_SUFFIX: test-distributed-3-3-linux.8xlarge.nvidia.gpu_10589292222 2023-01-11T22:51:06.6522044Z ##[endgroup] 2023-01-11T22:51:06.6674940Z adding: test/allowlist_for_publicAPI.json (deflated 78%) 2023-01-11T22:51:06.6709523Z adding: test/benchmark_utils/callgrind_artifacts.json (deflated 92%) 2023-01-11T22:51:06.6717044Z adding: test/profiler/profiler_utils_mock_events.json (deflated 87%) 2023-01-11T22:51:06.6718924Z adding: test/.pytorch-slow-tests.json (deflated 74%) 2023-01-11T22:51:06.6724711Z adding: test/.pytorch-disabled-tests.json (deflated 84%) 2023-01-11T22:51:06.6750767Z ##[group]Run # Remove any previous test reports if they exist 2023-01-11T22:51:06.6751142Z # Remove any previous test reports if they exist 2023-01-11T22:51:06.6751465Z rm -f test-reports-*.zip 2023-01-11T22:51:06.6751822Z zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' -i '*.csv' 2023-01-11T22:51:06.6763711Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T22:51:06.6764009Z env: 2023-01-11T22:51:06.6764252Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:51:06.6764500Z GPU_FLAG: --gpus all 2023-01-11T22:51:06.6764855Z DOCKER_CONTAINER_ID: 7c5487d9c02b51c7d3fc373594ecc7b146547c11845b5a7279d2f69d27bfcc5b 2023-01-11T22:51:06.6765314Z FILE_SUFFIX: test-distributed-3-3-linux.8xlarge.nvidia.gpu_10589292222 2023-01-11T22:51:06.6765663Z ##[endgroup] 2023-01-11T22:51:06.6914541Z adding: test/test-reports/python-unittest/distributed.checkpoint.test_2d_model_state_checkpoint/TEST-Test2dModelStateCheckpoint-20230111211314.xml (deflated 45%) 2023-01-11T22:51:06.6915424Z adding: test/test-reports/python-unittest/distributed._tensor.test_device_mesh/TEST-DeviceMeshCollectiveTest-20230111211320.xml (deflated 88%) 2023-01-11T22:51:06.6916200Z adding: test/test-reports/python-unittest/distributed._tensor.test_device_mesh/TEST-DeviceMeshTest-20230111211320.xml (deflated 73%) 2023-01-11T22:51:06.6921163Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_optim_state/TEST-TestFSDPOptimState-20230111211412.xml (deflated 93%) 2023-01-11T22:51:06.6921938Z adding: test/test-reports/python-unittest/distributed.elastic.metrics.api_test/TEST-MetricsApiTest-20230111221056.xml (deflated 62%) 2023-01-11T22:51:06.6922715Z adding: test/test-reports/python-unittest/distributed.checkpoint.test_utils/TEST-TestMedatadaIndex-20230111221100.xml (deflated 71%) 2023-01-11T22:51:06.6923628Z adding: test/test-reports/python-unittest/distributed.checkpoint.test_nested_dict/TEST-TestFlattening-20230111221104.xml (deflated 54%) 2023-01-11T22:51:06.6924423Z adding: test/test-reports/python-unittest/distributed.elastic.utils.logging_test/TEST-LoggingTest-20230111221108.xml (deflated 54%) 2023-01-11T22:51:06.6925281Z adding: test/test-reports/python-unittest/distributed.elastic.utils.util_test/TEST-StoreUtilTest-20230111221112.xml (deflated 63%) 2023-01-11T22:51:06.6926000Z adding: test/test-reports/python-unittest/distributed.elastic.utils.util_test/TEST-UtilTest-20230111221112.xml (deflated 70%) 2023-01-11T22:51:06.6926788Z adding: test/test-reports/python-unittest/distributed.test_multi_threaded_pg/TEST-TestCollectivesWithBaseClass-20230111221116.xml (deflated 76%) 2023-01-11T22:51:06.6927616Z adding: test/test-reports/python-unittest/distributed.test_multi_threaded_pg/TEST-TestCollectivesWithWrapper-20230111221116.xml (deflated 74%) 2023-01-11T22:51:06.6928389Z adding: test/test-reports/python-unittest/distributed.rpc.test_share_memory/TEST-TestRPCPickler-20230111221122.xml (deflated 42%) 2023-01-11T22:51:06.6929188Z adding: test/test-reports/python-unittest/distributed.elastic.utils.distributed_test/TEST-DistributedUtilTest-20230111221129.xml (deflated 78%) 2023-01-11T22:51:06.6930012Z adding: test/test-reports/python-unittest/distributed.elastic.timer.local_timer_test/TEST-LocalTimerServerTest-20230111221136.xml (deflated 71%) 2023-01-11T22:51:06.6930815Z adding: test/test-reports/python-unittest/distributed.elastic.timer.local_timer_test/TEST-LocalTimerTest-20230111221136.xml (deflated 69%) 2023-01-11T22:51:06.6931953Z adding: test/test-reports/python-unittest/distributed.elastic.timer.local_timer_test/TEST-MultiprocessingRequestQueueTest-20230111221136.xml (deflated 66%) 2023-01-11T22:51:06.6932851Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_multiple_forward/TEST-TestMultiForward-20230111221144.xml (deflated 41%) 2023-01-11T22:51:06.6933667Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_softmax/TEST-TestShardedSoftmax-20230111221153.xml (deflated 59%) 2023-01-11T22:51:06.6934502Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_embedding/TEST-TestShardedEmbedding-20230111221201.xml (deflated 60%) 2023-01-11T22:51:06.6935291Z adding: test/test-reports/python-unittest/distributed.test_c10d_error_logger/TEST-C10dErrorLoggerTest-20230111221210.xml (deflated 53%) 2023-01-11T22:51:06.6936125Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_linear/TEST-TestShardedTensorOpsLinear-20230111221220.xml (deflated 69%) 2023-01-11T22:51:06.6937252Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_pure_fp16/TEST-TestPureFP16-20230111221231.xml (deflated 60%) 2023-01-11T22:51:06.6938126Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_elementwise_ops/TEST-TestShardedTensorElementWiseOps-20230111221244.xml (deflated 74%) 2023-01-11T22:51:06.6938998Z adding: test/test-reports/python-unittest/distributed._shard.sharding_plan.test_sharding_plan/TEST-TestShardingPlan-20230111221257.xml (deflated 75%) 2023-01-11T22:51:06.6939745Z adding: test/test-reports/python-unittest/distributed._tensor.test_api/TEST-DTensorAPITest-20230111221313.xml (deflated 75%) 2023-01-11T22:51:06.6940518Z adding: test/test-reports/python-unittest/distributed._composable.test_replicate/TEST-ReplicateStateDictTest-20230111221329.xml (deflated 60%) 2023-01-11T22:51:06.6941295Z adding: test/test-reports/python-unittest/distributed._composable.test_replicate/TEST-ReplicateTest-20230111221329.xml (deflated 63%) 2023-01-11T22:51:06.6942114Z adding: test/test-reports/python-unittest/distributed.tensor.parallel.test_parallelize_api/TEST-TensorParallelAPITests-20230111221345.xml (deflated 79%) 2023-01-11T22:51:06.6942965Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_tp_integration/TEST-TestTPFSDPIntegration-20230111221403.xml (deflated 80%) 2023-01-11T22:51:06.6943942Z adding: test/test-reports/python-unittest/distributed.checkpoint.test_checkpoint/TEST-TestDistributedCheckpointing-20230111221427.xml (deflated 55%) 2023-01-11T22:51:06.6944888Z adding: test/test-reports/python-unittest/distributed.checkpoint.test_checkpoint/TEST-TestDistributedFailure-20230111221427.xml (deflated 77%) 2023-01-11T22:51:06.6945723Z adding: test/test-reports/python-unittest/distributed.tensor.parallel.test_tp_style/TEST-TensorParallelStyleTest-20230111221453.xml (deflated 82%) 2023-01-11T22:51:06.6946601Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_matrix_ops/TEST-TestShardedTensorMatrixOps-20230111221516.xml (deflated 86%) 2023-01-11T22:51:06.6947400Z adding: test/test-reports/python-unittest/distributed._tensor.test_matrix_ops/TEST-DistMatrixOpsTest-20230111221546.xml (deflated 75%) 2023-01-11T22:51:06.6948173Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_flatten_params/TEST-TestFlattenParams-20230111221622.xml (deflated 77%) 2023-01-11T22:51:06.6948907Z adding: test/test-reports/python-unittest/distributed.test_c10d_common/TEST-CommTest-20230111221703.xml (deflated 38%) 2023-01-11T22:51:06.6949661Z adding: test/test-reports/python-unittest/distributed.test_c10d_common/TEST-ComputeBucketAssignmentTest-20230111221710.xml (deflated 42%) 2023-01-11T22:51:06.6950469Z adding: test/test-reports/python-unittest/distributed.test_c10d_common/TEST-ComputeBucketAssignmentTest-20230111221714.xml (deflated 40%) 2023-01-11T22:51:06.6951253Z adding: test/test-reports/python-unittest/distributed.test_c10d_common/TEST-ComputeBucketAssignmentTest-20230111221717.xml (deflated 40%) 2023-01-11T22:51:06.6952051Z adding: test/test-reports/python-unittest/distributed.test_c10d_common/TEST-ComputeBucketAssignmentTest-20230111221721.xml (deflated 42%) 2023-01-11T22:51:06.6952871Z adding: test/test-reports/python-unittest/distributed.test_c10d_common/TEST-PythonProcessGroupExtensionTest-20230111221725.xml (deflated 41%) 2023-01-11T22:51:06.6953722Z adding: test/test-reports/python-unittest/distributed.test_c10d_common/TEST-PythonProcessGroupExtensionTest-20230111221731.xml (deflated 41%) 2023-01-11T22:51:06.6954566Z adding: test/test-reports/python-unittest/distributed.test_c10d_common/TEST-PythonProcessGroupExtensionTest-20230111221740.xml (deflated 41%) 2023-01-11T22:51:06.6955400Z adding: test/test-reports/python-unittest/distributed.test_c10d_common/TEST-PythonProcessGroupExtensionTest-20230111221747.xml (deflated 41%) 2023-01-11T22:51:06.6956159Z adding: test/test-reports/python-unittest/distributed.test_c10d_common/TEST-ReduceOpTest-20230111221755.xml (deflated 39%) 2023-01-11T22:51:06.6956855Z adding: test/test-reports/python-unittest/distributed.test_c10d_common/TEST-ReduceOpTest-20230111221759.xml (deflated 39%) 2023-01-11T22:51:06.6957527Z adding: test/test-reports/python-unittest/distributed.test_c10d_common/TEST-ReduceOpTest-20230111221803.xml (deflated 39%) 2023-01-11T22:51:06.6958182Z adding: test/test-reports/python-unittest/distributed.test_c10d_common/TEST-ReduceOpTest-20230111221806.xml (deflated 39%) 2023-01-11T22:51:06.6958917Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_comm/TEST-TestCommunication-20230111221811.xml (deflated 91%) 2023-01-11T22:51:06.6959700Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_freezing_weights/TEST-TestFreezingWeights-20230111221852.xml (deflated 84%) 2023-01-11T22:51:06.6960897Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_grad_acc/TEST-TestGradAcc-20230111221940.xml (deflated 93%) 2023-01-11T22:51:06.6962333Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_misc/TEST-TestFSDPMisc-20230111222047.xml (deflated 77%) 2023-01-11T22:51:06.6963082Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_misc/TEST-TestFSDPMiscWorldSize1-20230111222047.xml (deflated 41%) 2023-01-11T22:51:06.6964606Z adding: test/test-reports/python-unittest/distributed.fsdp.test_wrap/TEST-TestAutoWrap-20230111222205.xml (deflated 81%) 2023-01-11T22:51:06.6965953Z adding: test/test-reports/python-unittest/distributed.fsdp.test_wrap/TEST-TestFSDPWrap-20230111222205.xml (deflated 89%) 2023-01-11T22:51:06.6968880Z adding: test/test-reports/python-unittest/distributed.optim.test_zero_redundancy_optimizer/TEST-TestZeroRedundancyOptimizerDistributed-20230111222347.xml (deflated 90%) 2023-01-11T22:51:06.6969862Z adding: test/test-reports/python-unittest/distributed.optim.test_zero_redundancy_optimizer/TEST-TestZeroRedundancyOptimizerSingleRank-20230111222347.xml (deflated 73%) 2023-01-11T22:51:06.6970639Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111222654.xml (deflated 38%) 2023-01-11T22:51:06.6971296Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111222700.xml (deflated 38%) 2023-01-11T22:51:06.6971959Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111222708.xml (deflated 38%) 2023-01-11T22:51:06.6972623Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111222714.xml (deflated 38%) 2023-01-11T22:51:06.6973271Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111222720.xml (deflated 38%) 2023-01-11T22:51:06.6973931Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111222728.xml (deflated 38%) 2023-01-11T22:51:06.6974588Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111222736.xml (deflated 39%) 2023-01-11T22:51:06.6975234Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111222742.xml (deflated 38%) 2023-01-11T22:51:06.6975870Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111222749.xml (deflated 38%) 2023-01-11T22:51:06.6976527Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111222755.xml (deflated 37%) 2023-01-11T22:51:06.6977447Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111222801.xml (deflated 38%) 2023-01-11T22:51:06.6978128Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111222808.xml (deflated 38%) 2023-01-11T22:51:06.6978815Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111222814.xml (deflated 38%) 2023-01-11T22:51:06.6979479Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111222822.xml (deflated 38%) 2023-01-11T22:51:06.7001476Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111222828.xml (deflated 38%) 2023-01-11T22:51:06.7002282Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111222836.xml (deflated 38%) 2023-01-11T22:51:06.7003036Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111222843.xml (deflated 38%) 2023-01-11T22:51:06.7003770Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111222851.xml (deflated 38%) 2023-01-11T22:51:06.7004488Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111222857.xml (deflated 38%) 2023-01-11T22:51:06.7005224Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111222905.xml (deflated 39%) 2023-01-11T22:51:06.7005956Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111222911.xml (deflated 38%) 2023-01-11T22:51:06.7006682Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111222917.xml (deflated 38%) 2023-01-11T22:51:06.7007464Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111222925.xml (deflated 45%) 2023-01-11T22:51:06.7008466Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111222934.xml (deflated 44%) 2023-01-11T22:51:06.7009405Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111222942.xml (deflated 43%) 2023-01-11T22:51:06.7010263Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111222951.xml (deflated 43%) 2023-01-11T22:51:06.7011094Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111222959.xml (deflated 45%) 2023-01-11T22:51:06.7011975Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223007.xml (deflated 45%) 2023-01-11T22:51:06.7012820Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223016.xml (deflated 47%) 2023-01-11T22:51:06.7013672Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223024.xml (deflated 47%) 2023-01-11T22:51:06.7014526Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223032.xml (deflated 44%) 2023-01-11T22:51:06.7015355Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223041.xml (deflated 46%) 2023-01-11T22:51:06.7016206Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223049.xml (deflated 46%) 2023-01-11T22:51:06.7017313Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223058.xml (deflated 44%) 2023-01-11T22:51:06.7018169Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223106.xml (deflated 44%) 2023-01-11T22:51:06.7019001Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223114.xml (deflated 43%) 2023-01-11T22:51:06.7019855Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223121.xml (deflated 44%) 2023-01-11T22:51:06.7020703Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223128.xml (deflated 45%) 2023-01-11T22:51:06.7021553Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223135.xml (deflated 44%) 2023-01-11T22:51:06.7022374Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223141.xml (deflated 45%) 2023-01-11T22:51:06.7023224Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223147.xml (deflated 45%) 2023-01-11T22:51:06.7024073Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223153.xml (deflated 50%) 2023-01-11T22:51:06.7024920Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223201.xml (deflated 42%) 2023-01-11T22:51:06.7025752Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223209.xml (deflated 42%) 2023-01-11T22:51:06.7026596Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223217.xml (deflated 41%) 2023-01-11T22:51:06.7027436Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223225.xml (deflated 42%) 2023-01-11T22:51:06.7028279Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223233.xml (deflated 42%) 2023-01-11T22:51:06.7029097Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223241.xml (deflated 42%) 2023-01-11T22:51:06.7030036Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223248.xml (deflated 42%) 2023-01-11T22:51:06.7030991Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223254.xml (deflated 41%) 2023-01-11T22:51:06.7031831Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223300.xml (deflated 41%) 2023-01-11T22:51:06.7032682Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223306.xml (deflated 44%) 2023-01-11T22:51:06.7033506Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223313.xml (deflated 45%) 2023-01-11T22:51:06.7034349Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223319.xml (deflated 41%) 2023-01-11T22:51:06.7035196Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223327.xml (deflated 41%) 2023-01-11T22:51:06.7036034Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223333.xml (deflated 41%) 2023-01-11T22:51:06.7036862Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223341.xml (deflated 41%) 2023-01-11T22:51:06.7037709Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223348.xml (deflated 42%) 2023-01-11T22:51:06.7038555Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223354.xml (deflated 41%) 2023-01-11T22:51:06.7039398Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223403.xml (deflated 41%) 2023-01-11T22:51:06.7040326Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-GlooProcessGroupWithDispatchedCollectivesTests-20230111223412.xml (deflated 42%) 2023-01-11T22:51:06.7041358Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-GlooProcessGroupWithDispatchedCollectivesTests-20230111223418.xml (deflated 43%) 2023-01-11T22:51:06.7042381Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-GlooProcessGroupWithDispatchedCollectivesTests-20230111223424.xml (deflated 42%) 2023-01-11T22:51:06.7043407Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-GlooProcessGroupWithDispatchedCollectivesTests-20230111223430.xml (deflated 44%) 2023-01-11T22:51:06.7044409Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-GlooProcessGroupWithDispatchedCollectivesTests-20230111223436.xml (deflated 42%) 2023-01-11T22:51:06.7045312Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223442.xml (deflated 39%) 2023-01-11T22:51:06.7046118Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223449.xml (deflated 39%) 2023-01-11T22:51:06.7046922Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223457.xml (deflated 39%) 2023-01-11T22:51:06.7047692Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223503.xml (deflated 40%) 2023-01-11T22:51:06.7048490Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223510.xml (deflated 40%) 2023-01-11T22:51:06.7049287Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223516.xml (deflated 39%) 2023-01-11T22:51:06.7050081Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223522.xml (deflated 40%) 2023-01-11T22:51:06.7050920Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223529.xml (deflated 39%) 2023-01-11T22:51:06.7051772Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223539.xml (deflated 40%) 2023-01-11T22:51:06.7052572Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223545.xml (deflated 40%) 2023-01-11T22:51:06.7053367Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223554.xml (deflated 39%) 2023-01-11T22:51:06.7054139Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223602.xml (deflated 39%) 2023-01-11T22:51:06.7054937Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223608.xml (deflated 40%) 2023-01-11T22:51:06.7055734Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223615.xml (deflated 39%) 2023-01-11T22:51:06.7056702Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223621.xml (deflated 40%) 2023-01-11T22:51:06.7057509Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223628.xml (deflated 40%) 2023-01-11T22:51:06.7058370Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223634.xml (deflated 39%) 2023-01-11T22:51:06.7059147Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223642.xml (deflated 40%) 2023-01-11T22:51:06.7059940Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223649.xml (deflated 40%) 2023-01-11T22:51:06.7060725Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223656.xml (deflated 40%) 2023-01-11T22:51:06.7061524Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223704.xml (deflated 40%) 2023-01-11T22:51:06.7062302Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223711.xml (deflated 40%) 2023-01-11T22:51:06.7063096Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223717.xml (deflated 39%) 2023-01-11T22:51:06.7063893Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223725.xml (deflated 39%) 2023-01-11T22:51:06.7064690Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223732.xml (deflated 40%) 2023-01-11T22:51:06.7065461Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223738.xml (deflated 40%) 2023-01-11T22:51:06.7066258Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223747.xml (deflated 40%) 2023-01-11T22:51:06.7067060Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223753.xml (deflated 39%) 2023-01-11T22:51:06.7067854Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223800.xml (deflated 40%) 2023-01-11T22:51:06.7068628Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223808.xml (deflated 40%) 2023-01-11T22:51:06.7069416Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223815.xml (deflated 39%) 2023-01-11T22:51:06.7070208Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223821.xml (deflated 40%) 2023-01-11T22:51:06.7071081Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223828.xml (deflated 39%) 2023-01-11T22:51:06.7071876Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223838.xml (deflated 40%) 2023-01-11T22:51:06.7072739Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223845.xml (deflated 39%) 2023-01-11T22:51:06.7073529Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223851.xml (deflated 39%) 2023-01-11T22:51:06.7074321Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223859.xml (deflated 39%) 2023-01-11T22:51:06.7075094Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223906.xml (deflated 40%) 2023-01-11T22:51:06.7075891Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223913.xml (deflated 39%) 2023-01-11T22:51:06.7076688Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223922.xml (deflated 39%) 2023-01-11T22:51:06.7077484Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223928.xml (deflated 40%) 2023-01-11T22:51:06.7078257Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223935.xml (deflated 39%) 2023-01-11T22:51:06.7079051Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223941.xml (deflated 39%) 2023-01-11T22:51:06.7079849Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223950.xml (deflated 39%) 2023-01-11T22:51:06.7080642Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223956.xml (deflated 40%) 2023-01-11T22:51:06.7081421Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224003.xml (deflated 41%) 2023-01-11T22:51:06.7082219Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224005.xml (deflated 40%) 2023-01-11T22:51:06.7083005Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224012.xml (deflated 41%) 2023-01-11T22:51:06.7083793Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224014.xml (deflated 40%) 2023-01-11T22:51:06.7084566Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224022.xml (deflated 40%) 2023-01-11T22:51:06.7085325Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20230111224029.xml (deflated 39%) 2023-01-11T22:51:06.7086053Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20230111224031.xml (deflated 39%) 2023-01-11T22:51:06.7086781Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20230111224033.xml (deflated 39%) 2023-01-11T22:51:06.7087494Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20230111224035.xml (deflated 39%) 2023-01-11T22:51:06.7088219Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20230111224037.xml (deflated 38%) 2023-01-11T22:51:06.7088942Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20230111224040.xml (deflated 40%) 2023-01-11T22:51:06.7089690Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-RendezvousEnvTest-20230111224042.xml (deflated 39%) 2023-01-11T22:51:06.7090416Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-TimeoutTest-20230111224046.xml (deflated 41%) 2023-01-11T22:51:06.7091153Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestHooks-20230111224050.xml (deflated 79%) 2023-01-11T22:51:06.7092017Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestNoGrad-20230111224050.xml (deflated 64%) 2023-01-11T22:51:06.7092847Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestParamInit-20230111224050.xml (deflated 61%) 2023-01-11T22:51:06.7093622Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestParityWithDDP-20230111224050.xml (deflated 90%) 2023-01-11T22:51:06.7094440Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111211839.xml (deflated 41%) 2023-01-11T22:51:06.7095269Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111211848.xml (deflated 42%) 2023-01-11T22:51:06.7096100Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111211851.xml (deflated 42%) 2023-01-11T22:51:06.7097126Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111211858.xml (deflated 43%) 2023-01-11T22:51:06.7097967Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111211902.xml (deflated 41%) 2023-01-11T22:51:06.7098793Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111211910.xml (deflated 41%) 2023-01-11T22:51:06.7099618Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111211919.xml (deflated 40%) 2023-01-11T22:51:06.7100427Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111211929.xml (deflated 40%) 2023-01-11T22:51:06.7101253Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111211938.xml (deflated 40%) 2023-01-11T22:51:06.7102086Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111211946.xml (deflated 39%) 2023-01-11T22:51:06.7102914Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111211955.xml (deflated 39%) 2023-01-11T22:51:06.7103720Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212005.xml (deflated 40%) 2023-01-11T22:51:06.7104545Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212014.xml (deflated 40%) 2023-01-11T22:51:06.7105368Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212023.xml (deflated 42%) 2023-01-11T22:51:06.7106191Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212027.xml (deflated 42%) 2023-01-11T22:51:06.7106993Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212033.xml (deflated 42%) 2023-01-11T22:51:06.7107829Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212038.xml (deflated 42%) 2023-01-11T22:51:06.7108654Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212046.xml (deflated 42%) 2023-01-11T22:51:06.7109481Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212049.xml (deflated 45%) 2023-01-11T22:51:06.7110293Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212051.xml (deflated 47%) 2023-01-11T22:51:06.7111117Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212053.xml (deflated 48%) 2023-01-11T22:51:06.7112024Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212056.xml (deflated 45%) 2023-01-11T22:51:06.7112875Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212058.xml (deflated 40%) 2023-01-11T22:51:06.7113753Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212107.xml (deflated 44%) 2023-01-11T22:51:06.7114575Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212110.xml (deflated 44%) 2023-01-11T22:51:06.7115396Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212112.xml (deflated 44%) 2023-01-11T22:51:06.7116213Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212114.xml (deflated 44%) 2023-01-11T22:51:06.7117013Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212117.xml (deflated 44%) 2023-01-11T22:51:06.7117844Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212119.xml (deflated 41%) 2023-01-11T22:51:06.7118674Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212128.xml (deflated 42%) 2023-01-11T22:51:06.7119494Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212131.xml (deflated 42%) 2023-01-11T22:51:06.7120297Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212133.xml (deflated 41%) 2023-01-11T22:51:06.7121126Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212142.xml (deflated 42%) 2023-01-11T22:51:06.7121949Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212149.xml (deflated 43%) 2023-01-11T22:51:06.7122780Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212151.xml (deflated 43%) 2023-01-11T22:51:06.7123589Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212154.xml (deflated 42%) 2023-01-11T22:51:06.7124408Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212156.xml (deflated 42%) 2023-01-11T22:51:06.7125234Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212159.xml (deflated 40%) 2023-01-11T22:51:06.7126113Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212208.xml (deflated 40%) 2023-01-11T22:51:06.7126935Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212219.xml (deflated 43%) 2023-01-11T22:51:06.7127763Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212221.xml (deflated 41%) 2023-01-11T22:51:06.7128575Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212223.xml (deflated 41%) 2023-01-11T22:51:06.7129400Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212226.xml (deflated 41%) 2023-01-11T22:51:06.7130226Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212228.xml (deflated 41%) 2023-01-11T22:51:06.7131048Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212231.xml (deflated 41%) 2023-01-11T22:51:06.7131859Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212233.xml (deflated 41%) 2023-01-11T22:51:06.7132740Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212235.xml (deflated 41%) 2023-01-11T22:51:06.7133641Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212238.xml (deflated 41%) 2023-01-11T22:51:06.7134467Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212240.xml (deflated 41%) 2023-01-11T22:51:06.7135274Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212243.xml (deflated 41%) 2023-01-11T22:51:06.7136100Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212249.xml (deflated 41%) 2023-01-11T22:51:06.7137209Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212252.xml (deflated 42%) 2023-01-11T22:51:06.7138044Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212254.xml (deflated 41%) 2023-01-11T22:51:06.7138845Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212257.xml (deflated 41%) 2023-01-11T22:51:06.7139680Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212303.xml (deflated 40%) 2023-01-11T22:51:06.7140503Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212313.xml (deflated 41%) 2023-01-11T22:51:06.7141329Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212322.xml (deflated 41%) 2023-01-11T22:51:06.7142132Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212331.xml (deflated 40%) 2023-01-11T22:51:06.7142953Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212340.xml (deflated 42%) 2023-01-11T22:51:06.7143778Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212347.xml (deflated 42%) 2023-01-11T22:51:06.7144602Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212354.xml (deflated 42%) 2023-01-11T22:51:06.7145397Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212400.xml (deflated 42%) 2023-01-11T22:51:06.7146211Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212407.xml (deflated 41%) 2023-01-11T22:51:06.7147029Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212416.xml (deflated 41%) 2023-01-11T22:51:06.7147855Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212425.xml (deflated 40%) 2023-01-11T22:51:06.7148663Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212435.xml (deflated 41%) 2023-01-11T22:51:06.7149495Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212444.xml (deflated 40%) 2023-01-11T22:51:06.7150320Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212453.xml (deflated 41%) 2023-01-11T22:51:06.7151146Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212502.xml (deflated 41%) 2023-01-11T22:51:06.7151946Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212511.xml (deflated 41%) 2023-01-11T22:51:06.7152770Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212520.xml (deflated 41%) 2023-01-11T22:51:06.7153679Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212529.xml (deflated 41%) 2023-01-11T22:51:06.7154576Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212532.xml (deflated 41%) 2023-01-11T22:51:06.7155377Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212534.xml (deflated 40%) 2023-01-11T22:51:06.7156203Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212536.xml (deflated 42%) 2023-01-11T22:51:06.7157030Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212539.xml (deflated 42%) 2023-01-11T22:51:06.7157859Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212541.xml (deflated 42%) 2023-01-11T22:51:06.7158670Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212543.xml (deflated 41%) 2023-01-11T22:51:06.7159503Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212546.xml (deflated 42%) 2023-01-11T22:51:06.7160334Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212548.xml (deflated 42%) 2023-01-11T22:51:06.7161163Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212551.xml (deflated 43%) 2023-01-11T22:51:06.7161968Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212553.xml (deflated 42%) 2023-01-11T22:51:06.7162797Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212555.xml (deflated 42%) 2023-01-11T22:51:06.7163618Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212558.xml (deflated 42%) 2023-01-11T22:51:06.7164445Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212600.xml (deflated 43%) 2023-01-11T22:51:06.7165275Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212602.xml (deflated 42%) 2023-01-11T22:51:06.7166081Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212605.xml (deflated 42%) 2023-01-11T22:51:06.7166908Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212607.xml (deflated 42%) 2023-01-11T22:51:06.7167734Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212610.xml (deflated 42%) 2023-01-11T22:51:06.7168554Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212612.xml (deflated 43%) 2023-01-11T22:51:06.7169365Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212614.xml (deflated 42%) 2023-01-11T22:51:06.7170196Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212617.xml (deflated 42%) 2023-01-11T22:51:06.7171014Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212619.xml (deflated 42%) 2023-01-11T22:51:06.7171828Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212622.xml (deflated 42%) 2023-01-11T22:51:06.7172632Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212624.xml (deflated 42%) 2023-01-11T22:51:06.7173456Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212626.xml (deflated 42%) 2023-01-11T22:51:06.7174322Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212629.xml (deflated 42%) 2023-01-11T22:51:06.7175201Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212631.xml (deflated 43%) 2023-01-11T22:51:06.7176003Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212634.xml (deflated 41%) 2023-01-11T22:51:06.7177072Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212643.xml (deflated 42%) 2023-01-11T22:51:06.7177908Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212650.xml (deflated 42%) 2023-01-11T22:51:06.7178737Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212653.xml (deflated 42%) 2023-01-11T22:51:06.7179544Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212655.xml (deflated 40%) 2023-01-11T22:51:06.7180378Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212704.xml (deflated 42%) 2023-01-11T22:51:06.7181204Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212707.xml (deflated 42%) 2023-01-11T22:51:06.7182028Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212714.xml (deflated 42%) 2023-01-11T22:51:06.7182835Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212716.xml (deflated 42%) 2023-01-11T22:51:06.7183657Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212723.xml (deflated 42%) 2023-01-11T22:51:06.7184487Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212725.xml (deflated 43%) 2023-01-11T22:51:06.7185312Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212727.xml (deflated 43%) 2023-01-11T22:51:06.7186114Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212730.xml (deflated 41%) 2023-01-11T22:51:06.7186941Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212732.xml (deflated 41%) 2023-01-11T22:51:06.7187765Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212734.xml (deflated 41%) 2023-01-11T22:51:06.7188589Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212737.xml (deflated 41%) 2023-01-11T22:51:06.7189401Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212739.xml (deflated 40%) 2023-01-11T22:51:06.7190231Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212742.xml (deflated 41%) 2023-01-11T22:51:06.7191060Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212744.xml (deflated 41%) 2023-01-11T22:51:06.7191933Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212746.xml (deflated 40%) 2023-01-11T22:51:06.7192742Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212749.xml (deflated 41%) 2023-01-11T22:51:06.7193561Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212751.xml (deflated 41%) 2023-01-11T22:51:06.7194388Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212754.xml (deflated 41%) 2023-01-11T22:51:06.7195302Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212803.xml (deflated 42%) 2023-01-11T22:51:06.7196186Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212805.xml (deflated 40%) 2023-01-11T22:51:06.7197010Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212814.xml (deflated 42%) 2023-01-11T22:51:06.7197834Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212821.xml (deflated 41%) 2023-01-11T22:51:06.7198661Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212830.xml (deflated 42%) 2023-01-11T22:51:06.7199467Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212834.xml (deflated 42%) 2023-01-11T22:51:06.7200300Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212838.xml (deflated 42%) 2023-01-11T22:51:06.7201123Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212842.xml (deflated 40%) 2023-01-11T22:51:06.7201933Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212853.xml (deflated 40%) 2023-01-11T22:51:06.7202739Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212902.xml (deflated 40%) 2023-01-11T22:51:06.7203562Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212913.xml (deflated 41%) 2023-01-11T22:51:06.7204383Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212922.xml (deflated 40%) 2023-01-11T22:51:06.7205199Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212931.xml (deflated 42%) 2023-01-11T22:51:06.7206003Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212935.xml (deflated 42%) 2023-01-11T22:51:06.7206817Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212940.xml (deflated 40%) 2023-01-11T22:51:06.7207639Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212948.xml (deflated 40%) 2023-01-11T22:51:06.7208459Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212957.xml (deflated 40%) 2023-01-11T22:51:06.7209282Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213006.xml (deflated 40%) 2023-01-11T22:51:06.7210093Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213015.xml (deflated 42%) 2023-01-11T22:51:06.7210912Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213019.xml (deflated 41%) 2023-01-11T22:51:06.7211736Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213028.xml (deflated 42%) 2023-01-11T22:51:06.7212561Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213032.xml (deflated 41%) 2023-01-11T22:51:06.7213364Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213041.xml (deflated 42%) 2023-01-11T22:51:06.7214190Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213045.xml (deflated 42%) 2023-01-11T22:51:06.7215010Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213050.xml (deflated 40%) 2023-01-11T22:51:06.7215889Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213059.xml (deflated 40%) 2023-01-11T22:51:06.7216992Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213109.xml (deflated 42%) 2023-01-11T22:51:06.7217832Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213113.xml (deflated 41%) 2023-01-11T22:51:06.7218657Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213122.xml (deflated 42%) 2023-01-11T22:51:06.7219483Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213125.xml (deflated 42%) 2023-01-11T22:51:06.7220286Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213127.xml (deflated 42%) 2023-01-11T22:51:06.7221116Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213129.xml (deflated 41%) 2023-01-11T22:51:06.7221946Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213132.xml (deflated 41%) 2023-01-11T22:51:06.7222770Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213134.xml (deflated 41%) 2023-01-11T22:51:06.7223577Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213137.xml (deflated 41%) 2023-01-11T22:51:06.7224409Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213139.xml (deflated 41%) 2023-01-11T22:51:06.7225231Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213141.xml (deflated 41%) 2023-01-11T22:51:06.7226059Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213144.xml (deflated 42%) 2023-01-11T22:51:06.7226870Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213146.xml (deflated 42%) 2023-01-11T22:51:06.7227693Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213148.xml (deflated 42%) 2023-01-11T22:51:06.7228510Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213151.xml (deflated 42%) 2023-01-11T22:51:06.7229334Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213155.xml (deflated 41%) 2023-01-11T22:51:06.7230242Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213205.xml (deflated 41%) 2023-01-11T22:51:06.7231077Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213214.xml (deflated 40%) 2023-01-11T22:51:06.7231890Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213223.xml (deflated 41%) 2023-01-11T22:51:06.7232704Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213232.xml (deflated 40%) 2023-01-11T22:51:06.7233502Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213240.xml (deflated 40%) 2023-01-11T22:51:06.7234309Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213259.xml (deflated 41%) 2023-01-11T22:51:06.7235110Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213309.xml (deflated 41%) 2023-01-11T22:51:06.7236023Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213318.xml (deflated 41%) 2023-01-11T22:51:06.7236917Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213327.xml (deflated 40%) 2023-01-11T22:51:06.7237744Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213337.xml (deflated 42%) 2023-01-11T22:51:06.7238546Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213341.xml (deflated 42%) 2023-01-11T22:51:06.7239370Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213345.xml (deflated 42%) 2023-01-11T22:51:06.7240185Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213355.xml (deflated 40%) 2023-01-11T22:51:06.7240993Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213403.xml (deflated 42%) 2023-01-11T22:51:06.7241790Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213408.xml (deflated 41%) 2023-01-11T22:51:06.7242598Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213417.xml (deflated 42%) 2023-01-11T22:51:06.7243395Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213421.xml (deflated 41%) 2023-01-11T22:51:06.7244221Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213431.xml (deflated 41%) 2023-01-11T22:51:06.7245046Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213440.xml (deflated 41%) 2023-01-11T22:51:06.7245850Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213448.xml (deflated 42%) 2023-01-11T22:51:06.7246674Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213452.xml (deflated 42%) 2023-01-11T22:51:06.7247496Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213457.xml (deflated 42%) 2023-01-11T22:51:06.7248318Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213501.xml (deflated 40%) 2023-01-11T22:51:06.7249125Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213510.xml (deflated 41%) 2023-01-11T22:51:06.7249949Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213519.xml (deflated 40%) 2023-01-11T22:51:06.7250770Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213525.xml (deflated 41%) 2023-01-11T22:51:06.7251595Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213532.xml (deflated 42%) 2023-01-11T22:51:06.7252402Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213536.xml (deflated 42%) 2023-01-11T22:51:06.7253225Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213541.xml (deflated 40%) 2023-01-11T22:51:06.7254037Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213547.xml (deflated 41%) 2023-01-11T22:51:06.7254860Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213550.xml (deflated 41%) 2023-01-11T22:51:06.7255667Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213552.xml (deflated 42%) 2023-01-11T22:51:06.7256719Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213555.xml (deflated 41%) 2023-01-11T22:51:06.7257652Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213557.xml (deflated 41%) 2023-01-11T22:51:06.7258476Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213559.xml (deflated 40%) 2023-01-11T22:51:06.7259280Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213602.xml (deflated 41%) 2023-01-11T22:51:06.7260107Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213604.xml (deflated 41%) 2023-01-11T22:51:06.7260929Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213611.xml (deflated 42%) 2023-01-11T22:51:06.7261758Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213613.xml (deflated 41%) 2023-01-11T22:51:06.7262561Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213620.xml (deflated 40%) 2023-01-11T22:51:06.7263391Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213627.xml (deflated 41%) 2023-01-11T22:51:06.7264212Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213634.xml (deflated 40%) 2023-01-11T22:51:06.7265036Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213643.xml (deflated 41%) 2023-01-11T22:51:06.7265841Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213652.xml (deflated 41%) 2023-01-11T22:51:06.7266666Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213701.xml (deflated 41%) 2023-01-11T22:51:06.7267492Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213710.xml (deflated 41%) 2023-01-11T22:51:06.7268388Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213719.xml (deflated 41%) 2023-01-11T22:51:06.7269192Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213744.xml (deflated 40%) 2023-01-11T22:51:06.7270020Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213810.xml (deflated 42%) 2023-01-11T22:51:06.7270835Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213812.xml (deflated 42%) 2023-01-11T22:51:06.7271659Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213815.xml (deflated 41%) 2023-01-11T22:51:06.7272470Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213817.xml (deflated 41%) 2023-01-11T22:51:06.7273300Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213820.xml (deflated 41%) 2023-01-11T22:51:06.7274121Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213822.xml (deflated 42%) 2023-01-11T22:51:06.7274949Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213824.xml (deflated 42%) 2023-01-11T22:51:06.7275756Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213827.xml (deflated 42%) 2023-01-11T22:51:06.7276582Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213829.xml (deflated 42%) 2023-01-11T22:51:06.7277506Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213832.xml (deflated 42%) 2023-01-11T22:51:06.7278414Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213834.xml (deflated 42%) 2023-01-11T22:51:06.7279218Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213836.xml (deflated 42%) 2023-01-11T22:51:06.7280043Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213839.xml (deflated 42%) 2023-01-11T22:51:06.7280873Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213841.xml (deflated 40%) 2023-01-11T22:51:06.7281695Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213848.xml (deflated 40%) 2023-01-11T22:51:06.7282503Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213855.xml (deflated 41%) 2023-01-11T22:51:06.7283331Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213857.xml (deflated 42%) 2023-01-11T22:51:06.7284155Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213900.xml (deflated 42%) 2023-01-11T22:51:06.7284976Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213904.xml (deflated 40%) 2023-01-11T22:51:06.7285778Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213913.xml (deflated 41%) 2023-01-11T22:51:06.7286604Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213923.xml (deflated 40%) 2023-01-11T22:51:06.7287434Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213932.xml (deflated 42%) 2023-01-11T22:51:06.7288261Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213937.xml (deflated 41%) 2023-01-11T22:51:06.7289093Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213941.xml (deflated 40%) 2023-01-11T22:51:06.7289897Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213948.xml (deflated 40%) 2023-01-11T22:51:06.7290719Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213954.xml (deflated 42%) 2023-01-11T22:51:06.7291546Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213959.xml (deflated 40%) 2023-01-11T22:51:06.7292422Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214008.xml (deflated 40%) 2023-01-11T22:51:06.7293226Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214017.xml (deflated 41%) 2023-01-11T22:51:06.7294058Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214026.xml (deflated 41%) 2023-01-11T22:51:06.7294878Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214035.xml (deflated 42%) 2023-01-11T22:51:06.7295701Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214042.xml (deflated 42%) 2023-01-11T22:51:06.7296506Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214048.xml (deflated 42%) 2023-01-11T22:51:06.7297546Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214055.xml (deflated 42%) 2023-01-11T22:51:06.7298444Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214102.xml (deflated 41%) 2023-01-11T22:51:06.7299342Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214111.xml (deflated 41%) 2023-01-11T22:51:06.7300152Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214120.xml (deflated 43%) 2023-01-11T22:51:06.7300977Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214123.xml (deflated 41%) 2023-01-11T22:51:06.7301802Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214132.xml (deflated 43%) 2023-01-11T22:51:06.7302626Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214134.xml (deflated 43%) 2023-01-11T22:51:06.7303437Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214137.xml (deflated 40%) 2023-01-11T22:51:06.7304266Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214146.xml (deflated 42%) 2023-01-11T22:51:06.7305089Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214148.xml (deflated 42%) 2023-01-11T22:51:06.7305907Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214151.xml (deflated 40%) 2023-01-11T22:51:06.7306713Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214200.xml (deflated 41%) 2023-01-11T22:51:06.7307543Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214202.xml (deflated 41%) 2023-01-11T22:51:06.7308367Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214205.xml (deflated 41%) 2023-01-11T22:51:06.7309195Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214207.xml (deflated 42%) 2023-01-11T22:51:06.7310003Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214209.xml (deflated 41%) 2023-01-11T22:51:06.7310831Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214212.xml (deflated 41%) 2023-01-11T22:51:06.7311655Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214214.xml (deflated 41%) 2023-01-11T22:51:06.7312480Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214217.xml (deflated 41%) 2023-01-11T22:51:06.7313296Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214219.xml (deflated 41%) 2023-01-11T22:51:06.7314165Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214228.xml (deflated 42%) 2023-01-11T22:51:06.7314994Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214230.xml (deflated 42%) 2023-01-11T22:51:06.7315846Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214233.xml (deflated 42%) 2023-01-11T22:51:06.7316648Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214235.xml (deflated 41%) 2023-01-11T22:51:06.7317471Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214244.xml (deflated 41%) 2023-01-11T22:51:06.7318291Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214247.xml (deflated 41%) 2023-01-11T22:51:06.7319166Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214249.xml (deflated 41%) 2023-01-11T22:51:06.7320034Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214251.xml (deflated 41%) 2023-01-11T22:51:06.7320839Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214301.xml (deflated 41%) 2023-01-11T22:51:06.7321661Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214309.xml (deflated 40%) 2023-01-11T22:51:06.7322489Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214319.xml (deflated 40%) 2023-01-11T22:51:06.7323313Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214328.xml (deflated 42%) 2023-01-11T22:51:06.7324125Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214330.xml (deflated 42%) 2023-01-11T22:51:06.7324958Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214333.xml (deflated 41%) 2023-01-11T22:51:06.7325780Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214342.xml (deflated 41%) 2023-01-11T22:51:06.7326601Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214351.xml (deflated 41%) 2023-01-11T22:51:06.7327409Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214353.xml (deflated 41%) 2023-01-11T22:51:06.7328233Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214402.xml (deflated 41%) 2023-01-11T22:51:06.7329068Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214416.xml (deflated 40%) 2023-01-11T22:51:06.7329893Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214432.xml (deflated 41%) 2023-01-11T22:51:06.7330694Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214442.xml (deflated 42%) 2023-01-11T22:51:06.7331516Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214444.xml (deflated 42%) 2023-01-11T22:51:06.7332336Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214451.xml (deflated 43%) 2023-01-11T22:51:06.7333157Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214455.xml (deflated 42%) 2023-01-11T22:51:06.7333960Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214504.xml (deflated 41%) 2023-01-11T22:51:06.7334786Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214513.xml (deflated 40%) 2023-01-11T22:51:06.7335603Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214523.xml (deflated 40%) 2023-01-11T22:51:06.7336420Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214531.xml (deflated 41%) 2023-01-11T22:51:06.7337545Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214540.xml (deflated 39%) 2023-01-11T22:51:06.7338372Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214549.xml (deflated 39%) 2023-01-11T22:51:06.7339196Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214558.xml (deflated 40%) 2023-01-11T22:51:06.7340102Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214608.xml (deflated 40%) 2023-01-11T22:51:06.7340978Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214616.xml (deflated 42%) 2023-01-11T22:51:06.7341803Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214620.xml (deflated 41%) 2023-01-11T22:51:06.7342631Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214627.xml (deflated 42%) 2023-01-11T22:51:06.7343461Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214632.xml (deflated 42%) 2023-01-11T22:51:06.7344262Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214640.xml (deflated 42%) 2023-01-11T22:51:06.7345099Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214643.xml (deflated 45%) 2023-01-11T22:51:06.7345925Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214645.xml (deflated 46%) 2023-01-11T22:51:06.7346742Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214648.xml (deflated 48%) 2023-01-11T22:51:06.7347547Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214650.xml (deflated 45%) 2023-01-11T22:51:06.7348373Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214652.xml (deflated 40%) 2023-01-11T22:51:06.7349193Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214701.xml (deflated 43%) 2023-01-11T22:51:06.7350018Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214704.xml (deflated 43%) 2023-01-11T22:51:06.7350829Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214706.xml (deflated 43%) 2023-01-11T22:51:06.7351656Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214709.xml (deflated 43%) 2023-01-11T22:51:06.7352513Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214711.xml (deflated 43%) 2023-01-11T22:51:06.7353342Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214713.xml (deflated 40%) 2023-01-11T22:51:06.7354168Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214722.xml (deflated 41%) 2023-01-11T22:51:06.7354977Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214725.xml (deflated 41%) 2023-01-11T22:51:06.7355803Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214727.xml (deflated 40%) 2023-01-11T22:51:06.7356627Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214736.xml (deflated 41%) 2023-01-11T22:51:06.7357450Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214743.xml (deflated 42%) 2023-01-11T22:51:06.7358256Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214745.xml (deflated 42%) 2023-01-11T22:51:06.7359086Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214748.xml (deflated 42%) 2023-01-11T22:51:06.7360002Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214750.xml (deflated 42%) 2023-01-11T22:51:06.7360900Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214753.xml (deflated 40%) 2023-01-11T22:51:06.7361707Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214802.xml (deflated 40%) 2023-01-11T22:51:06.7362536Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214812.xml (deflated 43%) 2023-01-11T22:51:06.7363366Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214815.xml (deflated 41%) 2023-01-11T22:51:06.7364191Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214817.xml (deflated 41%) 2023-01-11T22:51:06.7365475Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214820.xml (deflated 41%) 2023-01-11T22:51:06.7366601Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214822.xml (deflated 41%) 2023-01-11T22:51:06.7367382Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214824.xml (deflated 41%) 2023-01-11T22:51:06.7368160Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214827.xml (deflated 41%) 2023-01-11T22:51:06.7368965Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214829.xml (deflated 41%) 2023-01-11T22:51:06.7369723Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214832.xml (deflated 41%) 2023-01-11T22:51:06.7370497Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214834.xml (deflated 41%) 2023-01-11T22:51:06.7371277Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214836.xml (deflated 41%) 2023-01-11T22:51:06.7372047Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214843.xml (deflated 41%) 2023-01-11T22:51:06.7372804Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214846.xml (deflated 41%) 2023-01-11T22:51:06.7373569Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214848.xml (deflated 41%) 2023-01-11T22:51:06.7374338Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214850.xml (deflated 40%) 2023-01-11T22:51:06.7375149Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214857.xml (deflated 40%) 2023-01-11T22:51:06.7375906Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214906.xml (deflated 40%) 2023-01-11T22:51:06.7376879Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214916.xml (deflated 41%) 2023-01-11T22:51:06.7377668Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214925.xml (deflated 41%) 2023-01-11T22:51:06.7378434Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214934.xml (deflated 42%) 2023-01-11T22:51:06.7379187Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214941.xml (deflated 42%) 2023-01-11T22:51:06.7379952Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214947.xml (deflated 41%) 2023-01-11T22:51:06.7380819Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214954.xml (deflated 42%) 2023-01-11T22:51:06.7381648Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215001.xml (deflated 41%) 2023-01-11T22:51:06.7382400Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215010.xml (deflated 41%) 2023-01-11T22:51:06.7383162Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215019.xml (deflated 40%) 2023-01-11T22:51:06.7383932Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215029.xml (deflated 41%) 2023-01-11T22:51:06.7384697Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215038.xml (deflated 40%) 2023-01-11T22:51:06.7385458Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215047.xml (deflated 41%) 2023-01-11T22:51:06.7386232Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215056.xml (deflated 41%) 2023-01-11T22:51:06.7387002Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215105.xml (deflated 40%) 2023-01-11T22:51:06.7387769Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215114.xml (deflated 41%) 2023-01-11T22:51:06.7388514Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215123.xml (deflated 41%) 2023-01-11T22:51:06.7389277Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215126.xml (deflated 41%) 2023-01-11T22:51:06.7390129Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215128.xml (deflated 40%) 2023-01-11T22:51:06.7390903Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215131.xml (deflated 42%) 2023-01-11T22:51:06.7391670Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215133.xml (deflated 42%) 2023-01-11T22:51:06.7392492Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215135.xml (deflated 42%) 2023-01-11T22:51:06.7393247Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215138.xml (deflated 41%) 2023-01-11T22:51:06.7394012Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215140.xml (deflated 43%) 2023-01-11T22:51:06.7394777Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215143.xml (deflated 42%) 2023-01-11T22:51:06.7395540Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215145.xml (deflated 43%) 2023-01-11T22:51:06.7396292Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215147.xml (deflated 42%) 2023-01-11T22:51:06.7397058Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215150.xml (deflated 43%) 2023-01-11T22:51:06.7397822Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215152.xml (deflated 42%) 2023-01-11T22:51:06.7398583Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215154.xml (deflated 43%) 2023-01-11T22:51:06.7399345Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215157.xml (deflated 42%) 2023-01-11T22:51:06.7400150Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215159.xml (deflated 42%) 2023-01-11T22:51:06.7400961Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215202.xml (deflated 42%) 2023-01-11T22:51:06.7401725Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215204.xml (deflated 42%) 2023-01-11T22:51:06.7402485Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215206.xml (deflated 43%) 2023-01-11T22:51:06.7403233Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215209.xml (deflated 42%) 2023-01-11T22:51:06.7403994Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215211.xml (deflated 42%) 2023-01-11T22:51:06.7404754Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215214.xml (deflated 42%) 2023-01-11T22:51:06.7405524Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215216.xml (deflated 42%) 2023-01-11T22:51:06.7406274Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215218.xml (deflated 42%) 2023-01-11T22:51:06.7407034Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215221.xml (deflated 42%) 2023-01-11T22:51:06.7407794Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215223.xml (deflated 42%) 2023-01-11T22:51:06.7408554Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215226.xml (deflated 42%) 2023-01-11T22:51:06.7409308Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215228.xml (deflated 41%) 2023-01-11T22:51:06.7410072Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215238.xml (deflated 42%) 2023-01-11T22:51:06.7410834Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215245.xml (deflated 42%) 2023-01-11T22:51:06.7411599Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215247.xml (deflated 42%) 2023-01-11T22:51:06.7412347Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215250.xml (deflated 40%) 2023-01-11T22:51:06.7413104Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215259.xml (deflated 42%) 2023-01-11T22:51:06.7413869Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215301.xml (deflated 42%) 2023-01-11T22:51:06.7414639Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215308.xml (deflated 42%) 2023-01-11T22:51:06.7415392Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215311.xml (deflated 42%) 2023-01-11T22:51:06.7416157Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215317.xml (deflated 42%) 2023-01-11T22:51:06.7417111Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215320.xml (deflated 43%) 2023-01-11T22:51:06.7417868Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215322.xml (deflated 43%) 2023-01-11T22:51:06.7418610Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215325.xml (deflated 41%) 2023-01-11T22:51:06.7419439Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215327.xml (deflated 41%) 2023-01-11T22:51:06.7420268Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215329.xml (deflated 41%) 2023-01-11T22:51:06.7421017Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215332.xml (deflated 41%) 2023-01-11T22:51:06.7421755Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215334.xml (deflated 40%) 2023-01-11T22:51:06.7422505Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215337.xml (deflated 41%) 2023-01-11T22:51:06.7423247Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215339.xml (deflated 41%) 2023-01-11T22:51:06.7423997Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215341.xml (deflated 40%) 2023-01-11T22:51:06.7424746Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215344.xml (deflated 41%) 2023-01-11T22:51:06.7425492Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215346.xml (deflated 41%) 2023-01-11T22:51:06.7426245Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215349.xml (deflated 40%) 2023-01-11T22:51:06.7426984Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215358.xml (deflated 41%) 2023-01-11T22:51:06.7427727Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215400.xml (deflated 40%) 2023-01-11T22:51:06.7428474Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215410.xml (deflated 42%) 2023-01-11T22:51:06.7429227Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215416.xml (deflated 40%) 2023-01-11T22:51:06.7429976Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215425.xml (deflated 42%) 2023-01-11T22:51:06.7430710Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215429.xml (deflated 42%) 2023-01-11T22:51:06.7431460Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215433.xml (deflated 42%) 2023-01-11T22:51:06.7432203Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215438.xml (deflated 41%) 2023-01-11T22:51:06.7432947Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215448.xml (deflated 40%) 2023-01-11T22:51:06.7433687Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215458.xml (deflated 40%) 2023-01-11T22:51:06.7434442Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215508.xml (deflated 40%) 2023-01-11T22:51:06.7435193Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215518.xml (deflated 40%) 2023-01-11T22:51:06.7435953Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215527.xml (deflated 42%) 2023-01-11T22:51:06.7436701Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215531.xml (deflated 42%) 2023-01-11T22:51:06.7437461Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215535.xml (deflated 40%) 2023-01-11T22:51:06.7438268Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215544.xml (deflated 40%) 2023-01-11T22:51:06.7439071Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215552.xml (deflated 41%) 2023-01-11T22:51:06.7439807Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215602.xml (deflated 40%) 2023-01-11T22:51:06.7440554Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215611.xml (deflated 42%) 2023-01-11T22:51:06.7441309Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215615.xml (deflated 41%) 2023-01-11T22:51:06.7442073Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215624.xml (deflated 42%) 2023-01-11T22:51:06.7442820Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215628.xml (deflated 41%) 2023-01-11T22:51:06.7443586Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215637.xml (deflated 42%) 2023-01-11T22:51:06.7444346Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215641.xml (deflated 42%) 2023-01-11T22:51:06.7445108Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215645.xml (deflated 40%) 2023-01-11T22:51:06.7445866Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215655.xml (deflated 40%) 2023-01-11T22:51:06.7446613Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215704.xml (deflated 42%) 2023-01-11T22:51:06.7447379Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215709.xml (deflated 40%) 2023-01-11T22:51:06.7448147Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215718.xml (deflated 42%) 2023-01-11T22:51:06.7448911Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215720.xml (deflated 42%) 2023-01-11T22:51:06.7449660Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215723.xml (deflated 42%) 2023-01-11T22:51:06.7450426Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215725.xml (deflated 41%) 2023-01-11T22:51:06.7451188Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215727.xml (deflated 41%) 2023-01-11T22:51:06.7451955Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215730.xml (deflated 41%) 2023-01-11T22:51:06.7452707Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215732.xml (deflated 41%) 2023-01-11T22:51:06.7453460Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215735.xml (deflated 41%) 2023-01-11T22:51:06.7454225Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215737.xml (deflated 41%) 2023-01-11T22:51:06.7454991Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215739.xml (deflated 41%) 2023-01-11T22:51:06.7455738Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215742.xml (deflated 42%) 2023-01-11T22:51:06.7456704Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215744.xml (deflated 42%) 2023-01-11T22:51:06.7457510Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215747.xml (deflated 42%) 2023-01-11T22:51:06.7458352Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215751.xml (deflated 41%) 2023-01-11T22:51:06.7459100Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215800.xml (deflated 41%) 2023-01-11T22:51:06.7459861Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215809.xml (deflated 41%) 2023-01-11T22:51:06.7460623Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215818.xml (deflated 40%) 2023-01-11T22:51:06.7461382Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215827.xml (deflated 40%) 2023-01-11T22:51:06.7462137Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215836.xml (deflated 40%) 2023-01-11T22:51:06.7462904Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215855.xml (deflated 41%) 2023-01-11T22:51:06.7463666Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215904.xml (deflated 41%) 2023-01-11T22:51:06.7464427Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215914.xml (deflated 41%) 2023-01-11T22:51:06.7465173Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215923.xml (deflated 40%) 2023-01-11T22:51:06.7465938Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215932.xml (deflated 42%) 2023-01-11T22:51:06.7466705Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215936.xml (deflated 42%) 2023-01-11T22:51:06.7467471Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215940.xml (deflated 41%) 2023-01-11T22:51:06.7468217Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215950.xml (deflated 40%) 2023-01-11T22:51:06.7468980Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215959.xml (deflated 42%) 2023-01-11T22:51:06.7469743Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220003.xml (deflated 41%) 2023-01-11T22:51:06.7470505Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220012.xml (deflated 42%) 2023-01-11T22:51:06.7471254Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220016.xml (deflated 41%) 2023-01-11T22:51:06.7472019Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220026.xml (deflated 41%) 2023-01-11T22:51:06.7472785Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220035.xml (deflated 41%) 2023-01-11T22:51:06.7473545Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220044.xml (deflated 42%) 2023-01-11T22:51:06.7474295Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220048.xml (deflated 42%) 2023-01-11T22:51:06.7475057Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220052.xml (deflated 42%) 2023-01-11T22:51:06.7475880Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220056.xml (deflated 41%) 2023-01-11T22:51:06.7476702Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220106.xml (deflated 42%) 2023-01-11T22:51:06.7477446Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220114.xml (deflated 41%) 2023-01-11T22:51:06.7478208Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220121.xml (deflated 41%) 2023-01-11T22:51:06.7478970Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220128.xml (deflated 42%) 2023-01-11T22:51:06.7479730Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220132.xml (deflated 42%) 2023-01-11T22:51:06.7480478Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220136.xml (deflated 40%) 2023-01-11T22:51:06.7481246Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220143.xml (deflated 41%) 2023-01-11T22:51:06.7482009Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220146.xml (deflated 41%) 2023-01-11T22:51:06.7482754Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220148.xml (deflated 42%) 2023-01-11T22:51:06.7483496Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220150.xml (deflated 41%) 2023-01-11T22:51:06.7484252Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220153.xml (deflated 41%) 2023-01-11T22:51:06.7485003Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220155.xml (deflated 41%) 2023-01-11T22:51:06.7485765Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220158.xml (deflated 41%) 2023-01-11T22:51:06.7486512Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220200.xml (deflated 42%) 2023-01-11T22:51:06.7487273Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220207.xml (deflated 43%) 2023-01-11T22:51:06.7488022Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220209.xml (deflated 41%) 2023-01-11T22:51:06.7488771Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220216.xml (deflated 41%) 2023-01-11T22:51:06.7489506Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220223.xml (deflated 41%) 2023-01-11T22:51:06.7490264Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220230.xml (deflated 41%) 2023-01-11T22:51:06.7491025Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220239.xml (deflated 41%) 2023-01-11T22:51:06.7491784Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220248.xml (deflated 41%) 2023-01-11T22:51:06.7492596Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220257.xml (deflated 41%) 2023-01-11T22:51:06.7493342Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220306.xml (deflated 41%) 2023-01-11T22:51:06.7494102Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220315.xml (deflated 41%) 2023-01-11T22:51:06.7494914Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220340.xml (deflated 41%) 2023-01-11T22:51:06.7495730Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220406.xml (deflated 42%) 2023-01-11T22:51:06.7496477Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220408.xml (deflated 42%) 2023-01-11T22:51:06.7497430Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220411.xml (deflated 41%) 2023-01-11T22:51:06.7498198Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220413.xml (deflated 41%) 2023-01-11T22:51:06.7498964Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220416.xml (deflated 41%) 2023-01-11T22:51:06.7499720Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220418.xml (deflated 42%) 2023-01-11T22:51:06.7500489Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220420.xml (deflated 42%) 2023-01-11T22:51:06.7501252Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220423.xml (deflated 42%) 2023-01-11T22:51:06.7502010Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220425.xml (deflated 42%) 2023-01-11T22:51:06.7502758Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220428.xml (deflated 42%) 2023-01-11T22:51:06.7503522Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220430.xml (deflated 42%) 2023-01-11T22:51:06.7504283Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220432.xml (deflated 42%) 2023-01-11T22:51:06.7505050Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220435.xml (deflated 41%) 2023-01-11T22:51:06.7505804Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220437.xml (deflated 40%) 2023-01-11T22:51:06.7506568Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220444.xml (deflated 40%) 2023-01-11T22:51:06.7507331Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220451.xml (deflated 41%) 2023-01-11T22:51:06.7508095Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220453.xml (deflated 42%) 2023-01-11T22:51:06.7508845Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220456.xml (deflated 42%) 2023-01-11T22:51:06.7509613Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220500.xml (deflated 41%) 2023-01-11T22:51:06.7510379Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220509.xml (deflated 41%) 2023-01-11T22:51:06.7511142Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220518.xml (deflated 41%) 2023-01-11T22:51:06.7511887Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220528.xml (deflated 42%) 2023-01-11T22:51:06.7512651Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220532.xml (deflated 41%) 2023-01-11T22:51:06.7513414Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220537.xml (deflated 40%) 2023-01-11T22:51:06.7514249Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220543.xml (deflated 40%) 2023-01-11T22:51:06.7515062Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220550.xml (deflated 42%) 2023-01-11T22:51:06.7515824Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220554.xml (deflated 40%) 2023-01-11T22:51:06.7516586Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220604.xml (deflated 40%) 2023-01-11T22:51:06.7517344Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220613.xml (deflated 41%) 2023-01-11T22:51:06.7518089Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220622.xml (deflated 42%) 2023-01-11T22:51:06.7518855Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220631.xml (deflated 42%) 2023-01-11T22:51:06.7519615Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220638.xml (deflated 42%) 2023-01-11T22:51:06.7520378Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220645.xml (deflated 42%) 2023-01-11T22:51:06.7521124Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220652.xml (deflated 42%) 2023-01-11T22:51:06.7521886Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220659.xml (deflated 41%) 2023-01-11T22:51:06.7522648Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220708.xml (deflated 41%) 2023-01-11T22:51:06.7523410Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220717.xml (deflated 43%) 2023-01-11T22:51:06.7524158Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220719.xml (deflated 41%) 2023-01-11T22:51:06.7524925Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220728.xml (deflated 43%) 2023-01-11T22:51:06.7525679Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220731.xml (deflated 43%) 2023-01-11T22:51:06.7526436Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220733.xml (deflated 40%) 2023-01-11T22:51:06.7527180Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220742.xml (deflated 42%) 2023-01-11T22:51:06.7527947Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220745.xml (deflated 42%) 2023-01-11T22:51:06.7528710Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220747.xml (deflated 40%) 2023-01-11T22:51:06.7529475Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220756.xml (deflated 41%) 2023-01-11T22:51:06.7530220Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220759.xml (deflated 41%) 2023-01-11T22:51:06.7530983Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220801.xml (deflated 41%) 2023-01-11T22:51:06.7531744Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220803.xml (deflated 42%) 2023-01-11T22:51:06.7532505Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220806.xml (deflated 41%) 2023-01-11T22:51:06.7533297Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220808.xml (deflated 41%) 2023-01-11T22:51:06.7534110Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220811.xml (deflated 41%) 2023-01-11T22:51:06.7534867Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220813.xml (deflated 41%) 2023-01-11T22:51:06.7535628Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220815.xml (deflated 41%) 2023-01-11T22:51:06.7536363Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220824.xml (deflated 42%) 2023-01-11T22:51:06.7537312Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220827.xml (deflated 42%) 2023-01-11T22:51:06.7538083Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220829.xml (deflated 42%) 2023-01-11T22:51:06.7538848Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220831.xml (deflated 41%) 2023-01-11T22:51:06.7539607Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220841.xml (deflated 41%) 2023-01-11T22:51:06.7540352Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220843.xml (deflated 41%) 2023-01-11T22:51:06.7541109Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220846.xml (deflated 41%) 2023-01-11T22:51:06.7541866Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220848.xml (deflated 40%) 2023-01-11T22:51:06.7542632Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220857.xml (deflated 41%) 2023-01-11T22:51:06.7543384Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220906.xml (deflated 40%) 2023-01-11T22:51:06.7544145Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220915.xml (deflated 40%) 2023-01-11T22:51:06.7544900Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220924.xml (deflated 42%) 2023-01-11T22:51:06.7545663Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220927.xml (deflated 42%) 2023-01-11T22:51:06.7546413Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220929.xml (deflated 41%) 2023-01-11T22:51:06.7547175Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220938.xml (deflated 41%) 2023-01-11T22:51:06.7547940Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220947.xml (deflated 42%) 2023-01-11T22:51:06.7548703Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220949.xml (deflated 41%) 2023-01-11T22:51:06.7549451Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220958.xml (deflated 41%) 2023-01-11T22:51:06.7550212Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221012.xml (deflated 41%) 2023-01-11T22:51:06.7572845Z ##[group]Run # Remove any previous test reports if they exist 2023-01-11T22:51:06.7573236Z # Remove any previous test reports if they exist 2023-01-11T22:51:06.7573556Z rm -f usage-log-*.zip 2023-01-11T22:51:06.7574010Z # this workflow is also run in bazel build test, but we dont generate usage reports for it 2023-01-11T22:51:06.7574447Z # so check to see if the file exists first 2023-01-11T22:51:06.7574756Z if [ -f 'usage_log.txt' ]; then 2023-01-11T22:51:06.7575093Z  zip "usage-log-${FILE_SUFFIX}.zip" 'usage_log.txt' 2023-01-11T22:51:06.7575370Z fi 2023-01-11T22:51:06.7587390Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T22:51:06.7587690Z env: 2023-01-11T22:51:06.7587914Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:51:06.7588181Z GPU_FLAG: --gpus all 2023-01-11T22:51:06.7588538Z DOCKER_CONTAINER_ID: 7c5487d9c02b51c7d3fc373594ecc7b146547c11845b5a7279d2f69d27bfcc5b 2023-01-11T22:51:06.7589002Z FILE_SUFFIX: test-distributed-3-3-linux.8xlarge.nvidia.gpu_10589292222 2023-01-11T22:51:06.7589341Z ##[endgroup] 2023-01-11T22:51:06.8444284Z adding: usage_log.txt (deflated 95%) 2023-01-11T22:51:06.8491279Z ##[group]Run seemethere/upload-artifact-s3@v5 2023-01-11T22:51:06.8491571Z with: 2023-01-11T22:51:06.8491926Z s3-prefix: pytorch/pytorch/3896099317/2/artifact 2023-01-11T22:51:06.8492210Z retention-days: 14 2023-01-11T22:51:06.8492477Z if-no-files-found: warn 2023-01-11T22:51:06.8492748Z path: test-jsons-*.zip 2023-01-11T22:51:06.8492983Z name: artifact 2023-01-11T22:51:06.8493229Z s3-bucket: gha-artifacts 2023-01-11T22:51:06.8493487Z region: us-east-1 2023-01-11T22:51:06.8493695Z env: 2023-01-11T22:51:06.8493927Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:51:06.8494188Z GPU_FLAG: --gpus all 2023-01-11T22:51:06.8494520Z DOCKER_CONTAINER_ID: 7c5487d9c02b51c7d3fc373594ecc7b146547c11845b5a7279d2f69d27bfcc5b 2023-01-11T22:51:06.8494856Z ##[endgroup] 2023-01-11T22:51:07.2906581Z NOTE: s3-prefix specified, ignoring name parameter 2023-01-11T22:51:07.2907271Z With the provided path, there will be 1 file uploaded 2023-01-11T22:51:07.2907669Z Uploading to s3 prefix: pytorch/pytorch/3896099317/2/artifact 2023-01-11T22:51:07.2919200Z Starting upload of test-jsons-test-distributed-3-3-linux.8xlarge.nvidia.gpu_10589292222.zip 2023-01-11T22:51:07.4558544Z Finished upload of test-jsons-test-distributed-3-3-linux.8xlarge.nvidia.gpu_10589292222.zip 2023-01-11T22:51:07.4727940Z ##[group]Run seemethere/upload-artifact-s3@v5 2023-01-11T22:51:07.4728237Z with: 2023-01-11T22:51:07.4728502Z s3-prefix: pytorch/pytorch/3896099317/2/artifact 2023-01-11T22:51:07.4728799Z retention-days: 14 2023-01-11T22:51:07.4729072Z if-no-files-found: error 2023-01-11T22:51:07.4729334Z path: test-reports-*.zip 2023-01-11T22:51:07.4729592Z name: artifact 2023-01-11T22:51:07.4729842Z s3-bucket: gha-artifacts 2023-01-11T22:51:07.4730101Z region: us-east-1 2023-01-11T22:51:07.4730313Z env: 2023-01-11T22:51:07.4730551Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:51:07.4730817Z GPU_FLAG: --gpus all 2023-01-11T22:51:07.4731149Z DOCKER_CONTAINER_ID: 7c5487d9c02b51c7d3fc373594ecc7b146547c11845b5a7279d2f69d27bfcc5b 2023-01-11T22:51:07.4731489Z ##[endgroup] 2023-01-11T22:51:07.9144280Z NOTE: s3-prefix specified, ignoring name parameter 2023-01-11T22:51:07.9145320Z With the provided path, there will be 1 file uploaded 2023-01-11T22:51:07.9145728Z Uploading to s3 prefix: pytorch/pytorch/3896099317/2/artifact 2023-01-11T22:51:07.9158712Z Starting upload of test-reports-test-distributed-3-3-linux.8xlarge.nvidia.gpu_10589292222.zip 2023-01-11T22:51:08.1000603Z Finished upload of test-reports-test-distributed-3-3-linux.8xlarge.nvidia.gpu_10589292222.zip 2023-01-11T22:51:08.1157862Z ##[group]Run seemethere/upload-artifact-s3@v5 2023-01-11T22:51:08.1158159Z with: 2023-01-11T22:51:08.1158420Z s3-prefix: pytorch/pytorch/3896099317/2/artifact 2023-01-11T22:51:08.1158714Z retention-days: 14 2023-01-11T22:51:08.1158987Z if-no-files-found: ignore 2023-01-11T22:51:08.1159261Z path: usage-log-*.zip 2023-01-11T22:51:08.1159489Z name: artifact 2023-01-11T22:51:08.1159737Z s3-bucket: gha-artifacts 2023-01-11T22:51:08.1159996Z region: us-east-1 2023-01-11T22:51:08.1160205Z env: 2023-01-11T22:51:08.1160440Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:51:08.1160818Z GPU_FLAG: --gpus all 2023-01-11T22:51:08.1161246Z DOCKER_CONTAINER_ID: 7c5487d9c02b51c7d3fc373594ecc7b146547c11845b5a7279d2f69d27bfcc5b 2023-01-11T22:51:08.1161584Z ##[endgroup] 2023-01-11T22:51:08.5655648Z NOTE: s3-prefix specified, ignoring name parameter 2023-01-11T22:51:08.5656457Z With the provided path, there will be 1 file uploaded 2023-01-11T22:51:08.5657306Z Uploading to s3 prefix: pytorch/pytorch/3896099317/2/artifact 2023-01-11T22:51:08.5668427Z Starting upload of usage-log-test-distributed-3-3-linux.8xlarge.nvidia.gpu_10589292222.zip 2023-01-11T22:51:08.7717736Z Finished upload of usage-log-test-distributed-3-3-linux.8xlarge.nvidia.gpu_10589292222.zip 2023-01-11T22:51:08.7872519Z ##[group]Run # shellcheck disable=SC2156 2023-01-11T22:51:08.7872889Z # shellcheck disable=SC2156 2023-01-11T22:51:08.7873281Z find . -iname "core.[1-9]*" -exec docker exec "${DOCKER_CONTAINER_ID}" sh -c "gdb python {} -ex 'bt' -ex 'q'" \; 2023-01-11T22:51:08.7886603Z shell: /usr/bin/bash -e {0} 2023-01-11T22:51:08.7886867Z env: 2023-01-11T22:51:08.7887095Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:51:08.7887363Z GPU_FLAG: --gpus all 2023-01-11T22:51:08.7887722Z DOCKER_CONTAINER_ID: 7c5487d9c02b51c7d3fc373594ecc7b146547c11845b5a7279d2f69d27bfcc5b 2023-01-11T22:51:08.7888047Z ##[endgroup] 2023-01-11T22:51:09.1153127Z ##[group]Run set -x 2023-01-11T22:51:09.1153440Z set -x 2023-01-11T22:51:09.1153723Z python3 -m pip install -r requirements.txt 2023-01-11T22:51:09.1154066Z python3 -m pip install boto3==1.19.12 2023-01-11T22:51:09.1154463Z python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test 2023-01-11T22:51:09.1166711Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T22:51:09.1166987Z env: 2023-01-11T22:51:09.1167227Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:51:09.1167491Z GPU_FLAG: --gpus all 2023-01-11T22:51:09.1167823Z DOCKER_CONTAINER_ID: 7c5487d9c02b51c7d3fc373594ecc7b146547c11845b5a7279d2f69d27bfcc5b 2023-01-11T22:51:09.1168192Z AWS_DEFAULT_REGION: us-east-1 2023-01-11T22:51:09.1168457Z BRANCH: pull/91627 2023-01-11T22:51:09.1168697Z TEST_CONFIG: distributed 2023-01-11T22:51:09.1168956Z SHARD_NUMBER: 3 2023-01-11T22:51:09.1169269Z BUILD_ENVIRONMENT: linux-bionic-cuda11.6-py3.10-gcc7 2023-01-11T22:51:09.1169601Z PR_NUMBER: 91627 2023-01-11T22:51:09.1169864Z PYTORCH_RETRY_TEST_CASES: 1 2023-01-11T22:51:09.1170130Z PYTORCH_OVERRIDE_FLAKY_SIGNAL: 1 2023-01-11T22:51:09.1170448Z SHA1: 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T22:51:09.1170724Z TAG: 2023-01-11T22:51:09.1170936Z WORKFLOW_ID: 3896099317 2023-01-11T22:51:09.1171352Z GITHUB_TOKEN: *** 2023-01-11T22:51:09.1171616Z GHA_WORKFLOW_JOB_ID: 10589292222 2023-01-11T22:51:09.1171858Z ##[endgroup] 2023-01-11T22:51:09.1200434Z + python3 -m pip install -r requirements.txt 2023-01-11T22:51:09.4091009Z Defaulting to user installation because normal site-packages is not writeable 2023-01-11T22:51:09.4951986Z Collecting astunparse 2023-01-11T22:51:09.5122060Z Downloading astunparse-1.6.3-py2.py3-none-any.whl (12 kB) 2023-01-11T22:51:09.5651229Z Collecting expecttest 2023-01-11T22:51:09.5698050Z Downloading expecttest-0.1.4-py3-none-any.whl (6.5 kB) 2023-01-11T22:51:09.6083415Z Collecting future 2023-01-11T22:51:09.6128818Z Downloading future-0.18.2.tar.gz (829 kB) 2023-01-11T22:51:11.4617657Z Collecting hypothesis 2023-01-11T22:51:11.4685865Z Downloading hypothesis-6.62.0-py3-none-any.whl (399 kB) 2023-01-11T22:51:12.1560580Z Collecting numpy 2023-01-11T22:51:12.1606851Z Downloading numpy-1.21.6-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.7 MB) 2023-01-11T22:51:12.4675332Z Requirement already satisfied: psutil in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 7)) (5.9.1) 2023-01-11T22:51:12.5860432Z Collecting pyyaml 2023-01-11T22:51:12.5934621Z Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB) 2023-01-11T22:51:12.6130946Z Requirement already satisfied: requests in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 9)) (2.26.0) 2023-01-11T22:51:12.6301357Z Requirement already satisfied: setuptools in /usr/lib/python3.7/site-packages (from -r requirements.txt (line 10)) (49.1.3) 2023-01-11T22:51:12.6923215Z Collecting six 2023-01-11T22:51:12.6994657Z Downloading six-1.16.0-py2.py3-none-any.whl (11 kB) 2023-01-11T22:51:12.7360152Z Collecting types-dataclasses 2023-01-11T22:51:12.7408401Z Downloading types_dataclasses-0.6.6-py3-none-any.whl (2.9 kB) 2023-01-11T22:51:12.7839619Z Collecting typing_extensions 2023-01-11T22:51:12.7887632Z Downloading typing_extensions-4.4.0-py3-none-any.whl (26 kB) 2023-01-11T22:51:12.8448528Z Collecting sympy 2023-01-11T22:51:12.8544426Z Downloading sympy-1.10.1-py3-none-any.whl (6.4 MB) 2023-01-11T22:51:13.0434274Z Collecting filelock 2023-01-11T22:51:13.0477823Z Downloading filelock-3.9.0-py3-none-any.whl (9.7 kB) 2023-01-11T22:51:13.1417020Z Collecting networkx 2023-01-11T22:51:13.1506986Z Downloading networkx-2.6.3-py3-none-any.whl (1.9 MB) 2023-01-11T22:51:13.2709504Z Collecting jinja2 2023-01-11T22:51:13.2754736Z Downloading Jinja2-3.1.2-py3-none-any.whl (133 kB) 2023-01-11T22:51:13.3704290Z Collecting wheel<1.0,>=0.23.0 2023-01-11T22:51:13.3750743Z Downloading wheel-0.38.4-py3-none-any.whl (36 kB) 2023-01-11T22:51:13.4150233Z Collecting exceptiongroup>=1.0.0; python_version < "3.11" 2023-01-11T22:51:13.4198561Z Downloading exceptiongroup-1.1.0-py3-none-any.whl (14 kB) 2023-01-11T22:51:13.4713620Z Collecting attrs>=19.2.0 2023-01-11T22:51:13.4758677Z Downloading attrs-22.2.0-py3-none-any.whl (60 kB) 2023-01-11T22:51:13.5610312Z Collecting sortedcontainers<3.0.0,>=2.1.0 2023-01-11T22:51:13.5658681Z Downloading sortedcontainers-2.4.0-py2.py3-none-any.whl (29 kB) 2023-01-11T22:51:13.5758984Z Requirement already satisfied: charset-normalizer~=2.0.0; python_version >= "3" in /home/ec2-user/.local/lib/python3.7/site-packages (from requests->-r requirements.txt (line 9)) (2.0.12) 2023-01-11T22:51:13.5786450Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from requests->-r requirements.txt (line 9)) (1.26.14) 2023-01-11T22:51:13.6010726Z Requirement already satisfied: certifi>=2017.4.17 in /home/ec2-user/.local/lib/python3.7/site-packages (from requests->-r requirements.txt (line 9)) (2022.12.7) 2023-01-11T22:51:13.6024382Z Requirement already satisfied: idna<4,>=2.5; python_version >= "3" in /home/ec2-user/.local/lib/python3.7/site-packages (from requests->-r requirements.txt (line 9)) (3.4) 2023-01-11T22:51:13.6303762Z Collecting mpmath>=0.19 2023-01-11T22:51:13.6367596Z Downloading mpmath-1.2.1-py3-none-any.whl (532 kB) 2023-01-11T22:51:13.7866666Z Collecting MarkupSafe>=2.0 2023-01-11T22:51:13.7914267Z Downloading MarkupSafe-2.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB) 2023-01-11T22:51:13.8013439Z Using legacy 'setup.py install' for future, since package 'wheel' is not installed. 2023-01-11T22:51:13.9883102Z Installing collected packages: six, wheel, astunparse, expecttest, future, exceptiongroup, attrs, sortedcontainers, hypothesis, numpy, pyyaml, types-dataclasses, typing-extensions, mpmath, sympy, filelock, networkx, MarkupSafe, jinja2 2023-01-11T22:51:14.0300933Z WARNING: The script wheel is installed in '/home/ec2-user/.local/bin' which is not on PATH. 2023-01-11T22:51:14.0301596Z Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. 2023-01-11T22:51:14.0611373Z Running setup.py install for future: started 2023-01-11T22:51:14.7158521Z Running setup.py install for future: finished with status 'done' 2023-01-11T22:51:15.0254207Z WARNING: The script hypothesis is installed in '/home/ec2-user/.local/bin' which is not on PATH. 2023-01-11T22:51:15.0254879Z Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. 2023-01-11T22:51:17.0144687Z WARNING: The scripts f2py, f2py3 and f2py3.7 are installed in '/home/ec2-user/.local/bin' which is not on PATH. 2023-01-11T22:51:17.0145395Z Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. 2023-01-11T22:51:25.9367096Z WARNING: The script isympy is installed in '/home/ec2-user/.local/bin' which is not on PATH. 2023-01-11T22:51:25.9367748Z Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. 2023-01-11T22:51:27.1865002Z Successfully installed MarkupSafe-2.1.1 astunparse-1.6.3 attrs-22.2.0 exceptiongroup-1.1.0 expecttest-0.1.4 filelock-3.9.0 future-0.18.2 hypothesis-6.62.0 jinja2-3.1.2 mpmath-1.2.1 networkx-2.6.3 numpy-1.21.6 pyyaml-6.0 six-1.16.0 sortedcontainers-2.4.0 sympy-1.10.1 types-dataclasses-0.6.6 typing-extensions-4.4.0 wheel-0.38.4 2023-01-11T22:51:27.2655665Z + python3 -m pip install boto3==1.19.12 2023-01-11T22:51:27.5572421Z Defaulting to user installation because normal site-packages is not writeable 2023-01-11T22:51:28.5447511Z Collecting boto3==1.19.12 2023-01-11T22:51:28.5648297Z Downloading boto3-1.19.12-py3-none-any.whl (131 kB) 2023-01-11T22:51:28.6266668Z Collecting s3transfer<0.6.0,>=0.5.0 2023-01-11T22:51:28.6313742Z Downloading s3transfer-0.5.2-py3-none-any.whl (79 kB) 2023-01-11T22:51:29.9441566Z Collecting botocore<1.23.0,>=1.22.12 2023-01-11T22:51:29.9519594Z Downloading botocore-1.22.12-py3-none-any.whl (8.1 MB) 2023-01-11T22:51:30.1519260Z Collecting jmespath<1.0.0,>=0.7.1 2023-01-11T22:51:30.1566074Z Downloading jmespath-0.10.0-py2.py3-none-any.whl (24 kB) 2023-01-11T22:51:30.1709959Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.14) 2023-01-11T22:51:30.2381834Z Collecting python-dateutil<3.0.0,>=2.1 2023-01-11T22:51:30.2431878Z Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB) 2023-01-11T22:51:30.2612307Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0) 2023-01-11T22:51:30.4711897Z Installing collected packages: jmespath, python-dateutil, botocore, s3transfer, boto3 2023-01-11T22:51:31.3641933Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2 2023-01-11T22:51:31.4207757Z + python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test 2023-01-11T22:57:15.8687983Z [scribe] Scribe access token not provided, sending report via boto3... 2023-01-11T22:57:15.8690744Z ERROR ENCOUNTERED WHEN UPLOADING TO SCRIBE: Read timeout on endpoint URL: "https://lambda.us-east-1.amazonaws.com/2015-03-31/functions/gh-ci-scribe-proxy/invocations" 2023-01-11T22:57:15.8691179Z 2023-01-11T22:57:15.8691408Z ----- Historic stats comparison result ------ 2023-01-11T22:57:15.8691596Z 2023-01-11T22:57:15.8691824Z job: linux-bionic-cuda11.6-py3.10-gcc7 2023-01-11T22:57:15.8695547Z commit: 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T22:57:15.8695885Z 2023-01-11T22:57:15.8696109Z Commit graph (base is most recent master ancestor with at least one S3 report): 2023-01-11T22:57:15.8696358Z 2023-01-11T22:57:15.8698584Z : (master) 2023-01-11T22:57:15.8698947Z | 2023-01-11T22:57:15.8699231Z | * 8419ddda87 (HEAD) total time 3176.65s 2023-01-11T22:57:15.8699507Z | | 2023-01-11T22:57:15.8699709Z | : (2 commits) 2023-01-11T22:57:15.8699937Z |/ 2023-01-11T22:57:15.8700525Z * db2a237763 (base) 18 reports, total time 5298.55s ± 3363.99s 2023-01-11T22:57:15.8700960Z * 2b0abd4ce3 18 reports, total time 5375.52s ± 3469.84s 2023-01-11T22:57:15.8701632Z * f7939b21e1 48 reports, total time 4156.05s ± 3748.80s 2023-01-11T22:57:15.8702084Z * cb3204823e 18 reports, total time 5386.78s ± 3494.12s 2023-01-11T22:57:15.8702494Z * 6e236553f5 18 reports, total time 5359.08s ± 3471.03s 2023-01-11T22:57:15.8703457Z * cce577b391 18 reports, total time 5383.89s ± 3498.12s 2023-01-11T22:57:15.8703937Z * fae821c2f1 18 reports, total time 5181.34s ± 3399.30s 2023-01-11T22:57:15.8704549Z * 0c3659586d 18 reports, total time 5181.30s ± 3409.47s 2023-01-11T22:57:15.8704951Z * 122245985a 18 reports, total time 5175.63s ± 3372.26s 2023-01-11T22:57:15.8705366Z * b797a24259 18 reports, total time 5151.22s ± 3376.21s 2023-01-11T22:57:15.8705641Z | 2023-01-11T22:57:15.8705832Z : 2023-01-11T22:57:15.8705972Z 2023-01-11T22:57:15.8706139Z Removed (across 1494 suites) 0 tests, totaling 0.00s 2023-01-11T22:57:15.8706491Z Modified (across 0 suites) 0 tests, totaling 0.00s 2023-01-11T22:57:15.8706820Z Added (across 61 suites) 826 tests, totaling +4030.67s 2023-01-11T22:57:15.9346940Z ##[group]Run pytorch/test-infra/.github/actions/teardown-linux@main 2023-01-11T22:57:15.9347278Z with: 2023-01-11T22:57:15.9347507Z env: 2023-01-11T22:57:15.9347749Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:57:15.9347999Z GPU_FLAG: --gpus all 2023-01-11T22:57:15.9348357Z DOCKER_CONTAINER_ID: 7c5487d9c02b51c7d3fc373594ecc7b146547c11845b5a7279d2f69d27bfcc5b 2023-01-11T22:57:15.9348695Z ##[endgroup] 2023-01-11T22:57:15.9368148Z ##[group]Run set -eou pipefail 2023-01-11T22:57:15.9368469Z set -eou pipefail 2023-01-11T22:57:15.9368717Z  2023-01-11T22:57:15.9369036Z echo "Holding runner for 2 hours until all ssh sessions have logged out" 2023-01-11T22:57:15.9369358Z for _ in $(seq 1440); do 2023-01-11T22:57:15.9369659Z  # Break if no ssh session exists anymore 2023-01-11T22:57:15.9369956Z  if [ "$(who)" = "" ]; then 2023-01-11T22:57:15.9370205Z  break 2023-01-11T22:57:15.9370458Z  fi 2023-01-11T22:57:15.9370673Z  echo "." 2023-01-11T22:57:15.9370915Z  sleep 5 2023-01-11T22:57:15.9371158Z done 2023-01-11T22:57:15.9384386Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T22:57:15.9384686Z env: 2023-01-11T22:57:15.9384928Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:57:15.9385179Z GPU_FLAG: --gpus all 2023-01-11T22:57:15.9385531Z DOCKER_CONTAINER_ID: 7c5487d9c02b51c7d3fc373594ecc7b146547c11845b5a7279d2f69d27bfcc5b 2023-01-11T22:57:15.9385866Z ##[endgroup] 2023-01-11T22:57:15.9414996Z Holding runner for 2 hours until all ssh sessions have logged out 2023-01-11T22:57:15.9488908Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2023-01-11T22:57:15.9489319Z # ignore expansion of "docker ps -q" since it could be empty 2023-01-11T22:57:15.9489665Z # shellcheck disable=SC2046 2023-01-11T22:57:15.9489976Z docker stop $(docker ps -q) || true 2023-01-11T22:57:15.9490275Z # Prune all of the docker images 2023-01-11T22:57:15.9490576Z docker system prune -af 2023-01-11T22:57:15.9502926Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T22:57:15.9503216Z env: 2023-01-11T22:57:15.9503463Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:57:15.9503733Z GPU_FLAG: --gpus all 2023-01-11T22:57:15.9504148Z DOCKER_CONTAINER_ID: 7c5487d9c02b51c7d3fc373594ecc7b146547c11845b5a7279d2f69d27bfcc5b 2023-01-11T22:57:15.9504476Z ##[endgroup] 2023-01-11T22:57:16.3159540Z 7c5487d9c02b 2023-01-11T22:57:17.4021739Z Deleted Containers: 2023-01-11T22:57:17.4022142Z 7c5487d9c02b51c7d3fc373594ecc7b146547c11845b5a7279d2f69d27bfcc5b 2023-01-11T22:57:17.4022360Z 2023-01-11T22:57:22.3370036Z Deleted Images: 2023-01-11T22:57:22.3370928Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd 2023-01-11T22:57:22.3371886Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7@sha256:866df6c1171dbe014496717cf2080d6cc72ca611a4e8146525c9ef09640c8ba4 2023-01-11T22:57:22.3372742Z deleted: sha256:09e297797cd8c095524ba49e041c45e57bf05ba16719f01e7240e8549da5beba 2023-01-11T22:57:22.3373174Z deleted: sha256:6d5f0082fbf8c3b01e49961283f44105a5bb12616f0073762021db97f20a16a5 2023-01-11T22:57:22.3373617Z deleted: sha256:19c574c96e47e3d16cc51cc088fed6cef16eaa2174e667ba0b395ac0e3b989bb 2023-01-11T22:57:22.3374076Z deleted: sha256:4fa7af758c23581dbcc2bd92defdc5fea97c8671fa67bbc30888ffbbf96c49a6 2023-01-11T22:57:22.3374555Z deleted: sha256:e68331e1b0f863bbdbd445ed8475d59d2234c9659264a4e49b7b096311445aee 2023-01-11T22:57:22.3374969Z deleted: sha256:69d886418998cd8758c555ed219fca3a457539e9d6f62c41c9664a80c82c4036 2023-01-11T22:57:22.3375394Z deleted: sha256:2368b1fdf0235d155eaa47e75ff379cff06cc82d63e03886a57363b8092d3c83 2023-01-11T22:57:22.3375821Z deleted: sha256:651c1e1b625aeaf8fa65e96ef11487e650a7a6d00ba8ec9fdfe7e89e762dc5c4 2023-01-11T22:57:22.3376397Z deleted: sha256:7b73f298df08c4b3aa849b55bb5e73ded619cb5b786fddcc21ece3c0b3887038 2023-01-11T22:57:22.3377438Z deleted: sha256:716f28b4b433958e5d1b9839a20dd22f9986f9c1b42fb95552f13d6bfd291efe 2023-01-11T22:57:22.3378158Z deleted: sha256:e4495993276176c504228e66b2fd6f348523c5af66f9292b2f7ea12acefcd606 2023-01-11T22:57:22.3378668Z deleted: sha256:d6f5fbd8782783697c73f4bcbce91a05e126575f72f0f58e1e9465aa57640a92 2023-01-11T22:57:22.3379099Z deleted: sha256:0b7725c897ee2681e3b1ff00aea6c14805c8050758cfa1010f561c9713934014 2023-01-11T22:57:22.3379539Z deleted: sha256:fdd864c6750bdd24f8cfb131673c6f04087e3cbecac2c1a9b3c30fefbd6d3070 2023-01-11T22:57:22.3379991Z deleted: sha256:bfaeaf77f180f62f3994ccdc2be80dd2ef7f4d25ffc8c9497dc51e6cff69711e 2023-01-11T22:57:22.3380461Z deleted: sha256:02f5a9d8be5a1bdd5a350d4c47147fd3dd46bfcedc7637f53a8a692720381fc2 2023-01-11T22:57:22.3380930Z deleted: sha256:e22cc66e4fd2e491fb4ae8194c35d6b1789f9f5d01e1dfbedf1c266c3a1537de 2023-01-11T22:57:22.3381343Z deleted: sha256:1536d02ae84ab410916541408cf2935f122735cc5d128324f6f82fbbeb913e80 2023-01-11T22:57:22.3381767Z deleted: sha256:a8374bc83a4bf3a838aaf8ee71b4a8281ac4eef801473b007f88b2f0efadc6c6 2023-01-11T22:57:22.3382192Z deleted: sha256:9381921b1e612b2d23517d24662a39f00be43efe2412e440ef41b487a48cb389 2023-01-11T22:57:22.3382616Z deleted: sha256:ae063d6cbeae3688a5a7bb8694d431dbb9792bfb7ff2908d3b25842f9586fc8e 2023-01-11T22:57:22.3383040Z deleted: sha256:595397ba048a351c7b09c25b5eba4cbb916c2db40dd80bde4d95c2b51b766045 2023-01-11T22:57:22.3383534Z deleted: sha256:7a2da4ffb8ea2b858fcdfc92f6e640dbb6a083b39dfa5e7aa87cd1296f9314e1 2023-01-11T22:57:22.3384012Z deleted: sha256:eddd66bfdbb5913f133ffda8d967ee0235f9f121434112ea8da4cfdf5f9ebbf4 2023-01-11T22:57:22.3384437Z deleted: sha256:c9392d92ee837d52b35b41e4de67d213886d844cecd9e769d9284dc21070aee8 2023-01-11T22:57:22.3384918Z deleted: sha256:fba16b8beafe9efa854d93f0e92718750ec97ced755ace0f6f51bbd5d1964f91 2023-01-11T22:57:22.3385353Z deleted: sha256:a09592e27d2e6896b9029f31e269787c92761fd19a48528851c48d85221cd4bc 2023-01-11T22:57:22.3385760Z deleted: sha256:8b2e3a8416af60625ccb9a8562891c1b9e85c2ee05f103b190ad5040313bf1f1 2023-01-11T22:57:22.3386182Z deleted: sha256:135caddb443044601f433d421e5c0f5d8ab02dff69cf2df9024a8ecb97c8948c 2023-01-11T22:57:22.3386595Z deleted: sha256:112b60db47585e101175390e102a66908f0f1175510c5e5d5f10b7e4e0c9769b 2023-01-11T22:57:22.3387010Z deleted: sha256:d9e5dd4e760b68190c010a9042c842afb0bcf3d4477a334d0e9c2d9c302ecb3c 2023-01-11T22:57:22.3387444Z deleted: sha256:7d37673dfb91518db6e68e9637f3db142a4eaccb9f548ab99d14ac52b3672325 2023-01-11T22:57:22.3387846Z deleted: sha256:12784d32c23b63941ebc8adf713d7167158c38c771d3b5f94506e036b6273dbd 2023-01-11T22:57:22.3388264Z deleted: sha256:395db4c8bde7e390accfa81ff84376004b78566423bd0a3bd7559e12e66759bc 2023-01-11T22:57:22.3388706Z deleted: sha256:a7aa04ee64333d427277f47b8e07dee6cc566b132a18ee166c305c890d4fd3bb 2023-01-11T22:57:22.3389116Z deleted: sha256:10a0e8d138614f29a13aecd15175039c7f86ca04fc588b439f56284b3c0e292a 2023-01-11T22:57:22.3389555Z deleted: sha256:68d97ecc8d2ec5f756b4a1b7e7451a413ed19f3d7c3ce52c81b30dc00fd96185 2023-01-11T22:57:22.3389986Z deleted: sha256:e833f7b95e0efaaf775293760365243991395e796d587deffe588e50fe7a9f1e 2023-01-11T22:57:22.3390533Z deleted: sha256:cbcd1e7502e949614d05d5d0a34316fd62665c3e030b3656d9c488cfee1eca34 2023-01-11T22:57:22.3390929Z deleted: sha256:a90ce4a75d9d408df6349607788566125c2662c603dc8f84c767b2256273ff12 2023-01-11T22:57:22.3391353Z deleted: sha256:bdee06da46fd67ff14bbc7286b40ef15174551b6452ca633be8576376a3dbddb 2023-01-11T22:57:22.3391791Z deleted: sha256:2cafe83ace87d1a83f30f8d458001f3a315a606d5251c20543cbd6604499aa73 2023-01-11T22:57:22.3392212Z deleted: sha256:cc1f7fb1208e7b05b48d3b3b2648946adaa723b10df2574c7db87ed9112b1510 2023-01-11T22:57:22.3392647Z deleted: sha256:9027ec66ecfd6490ecf157cdf665428dad438f54565d9d85263211da30e6684d 2023-01-11T22:57:22.3393089Z deleted: sha256:a27541cda46a9ede5932b1a1807360e2f6ada5bb0c30bfa3aa953d59dcc1bc5c 2023-01-11T22:57:22.3393651Z deleted: sha256:7b927de6f9fdeb74acee54e7654d04e2614a112cafb477b2433db34dcc7ebe28 2023-01-11T22:57:22.3394075Z deleted: sha256:8298c8753925ad5124b0611c4101e92ec1f877252f8320f5503ebf3e4e7e1314 2023-01-11T22:57:22.3394488Z deleted: sha256:0b741a2d83533cb3b47b912506d486fd477d8c8a1c520a0f9d7d62edfc55487d 2023-01-11T22:57:22.3394916Z deleted: sha256:9412a0a6fd6057a9939fed5beadf64568de1d230c0325a628e396f5b76444bbb 2023-01-11T22:57:22.3395354Z deleted: sha256:0a294dad6a5df3fab407cd28b31eadaf9f65a5bdbf8c7a0db349d3401b537cef 2023-01-11T22:57:22.3395809Z deleted: sha256:b42ee7d4f5715886c70d5cfef4724e889f773a20741ebc4ed9c1771eb09ad634 2023-01-11T22:57:22.3396241Z deleted: sha256:a68106c7b0e03f1e0fa9ad405ce403e2931174825bdcd8e259522ef40ec8a617 2023-01-11T22:57:22.3396670Z deleted: sha256:ad403ee05a64f55c765e665bd3f25a71650857123282dc1bad2af81f5665cda5 2023-01-11T22:57:22.3397076Z deleted: sha256:eb731dc1382c6cc80193d0454f740fc55f441c32f08cd0ce1b8784e7840e53df 2023-01-11T22:57:22.3397493Z deleted: sha256:45bbe3d22998589317c7f6c4dd591475423bb37ca9b922529c5878653483b18d 2023-01-11T22:57:22.3397727Z 2023-01-11T22:57:22.3467456Z Total reclaimed space: 21.38GB 2023-01-11T22:57:22.3526519Z Post job cleanup. 2023-01-11T22:57:22.3568035Z Post job cleanup. 2023-01-11T22:57:22.4931435Z [command]/usr/bin/git version 2023-01-11T22:57:22.4988113Z git version 2.38.1 2023-01-11T22:57:22.5050942Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/24cfc83d-ec90-4442-a469-29b41caf5fef' before making global git config changes 2023-01-11T22:57:22.5052752Z Adding repository directory to the temporary git global config as a safe directory 2023-01-11T22:57:22.5058115Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch 2023-01-11T22:57:22.5097518Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2023-01-11T22:57:22.5134538Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || : 2023-01-11T22:57:22.5461101Z Entering 'android/libs/fbjni' 2023-01-11T22:57:22.5503848Z Entering 'third_party/FP16' 2023-01-11T22:57:22.5549433Z Entering 'third_party/FXdiv' 2023-01-11T22:57:22.5591838Z Entering 'third_party/NNPACK' 2023-01-11T22:57:22.5634395Z Entering 'third_party/QNNPACK' 2023-01-11T22:57:22.5677058Z Entering 'third_party/VulkanMemoryAllocator' 2023-01-11T22:57:22.5721403Z Entering 'third_party/XNNPACK' 2023-01-11T22:57:22.5775741Z Entering 'third_party/benchmark' 2023-01-11T22:57:22.5818363Z Entering 'third_party/cpuinfo' 2023-01-11T22:57:22.5862152Z Entering 'third_party/cub' 2023-01-11T22:57:22.5904693Z Entering 'third_party/cudnn_frontend' 2023-01-11T22:57:22.5953232Z Entering 'third_party/cutlass' 2023-01-11T22:57:22.6004265Z Entering 'third_party/eigen' 2023-01-11T22:57:22.6047944Z Entering 'third_party/fbgemm' 2023-01-11T22:57:22.6090662Z Entering 'third_party/fbgemm/third_party/asmjit' 2023-01-11T22:57:22.6134019Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2023-01-11T22:57:22.6176188Z Entering 'third_party/fbgemm/third_party/googletest' 2023-01-11T22:57:22.6217433Z Entering 'third_party/fbgemm/third_party/hipify_torch' 2023-01-11T22:57:22.6262835Z Entering 'third_party/flatbuffers' 2023-01-11T22:57:22.6307250Z Entering 'third_party/fmt' 2023-01-11T22:57:22.6349448Z Entering 'third_party/foxi' 2023-01-11T22:57:22.6390415Z Entering 'third_party/gemmlowp/gemmlowp' 2023-01-11T22:57:22.6432273Z Entering 'third_party/gloo' 2023-01-11T22:57:22.6474947Z Entering 'third_party/googletest' 2023-01-11T22:57:22.6517001Z Entering 'third_party/ideep' 2023-01-11T22:57:22.6558466Z Entering 'third_party/ideep/mkl-dnn' 2023-01-11T22:57:22.6602701Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2023-01-11T22:57:22.6652272Z Entering 'third_party/ios-cmake' 2023-01-11T22:57:22.6694401Z Entering 'third_party/ittapi' 2023-01-11T22:57:22.6737204Z Entering 'third_party/kineto' 2023-01-11T22:57:22.6777701Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2023-01-11T22:57:22.6819903Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2023-01-11T22:57:22.6864308Z Entering 'third_party/nccl/nccl' 2023-01-11T22:57:22.6906163Z Entering 'third_party/neon2sse' 2023-01-11T22:57:22.6948101Z Entering 'third_party/nlohmann' 2023-01-11T22:57:22.6992809Z Entering 'third_party/onnx' 2023-01-11T22:57:22.7048943Z Entering 'third_party/onnx/third_party/benchmark' 2023-01-11T22:57:22.7092104Z Entering 'third_party/onnx/third_party/pybind11' 2023-01-11T22:57:22.7135948Z Entering 'third_party/onnx-tensorrt' 2023-01-11T22:57:22.7177110Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2023-01-11T22:57:22.7224617Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2023-01-11T22:57:22.7268666Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2023-01-11T22:57:22.7311674Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2023-01-11T22:57:22.7359256Z Entering 'third_party/pocketfft' 2023-01-11T22:57:22.7400876Z Entering 'third_party/protobuf' 2023-01-11T22:57:22.7447602Z Entering 'third_party/protobuf/third_party/benchmark' 2023-01-11T22:57:22.7490452Z Entering 'third_party/protobuf/third_party/googletest' 2023-01-11T22:57:22.7534181Z Entering 'third_party/psimd' 2023-01-11T22:57:22.7575693Z Entering 'third_party/pthreadpool' 2023-01-11T22:57:22.7618565Z Entering 'third_party/pybind11' 2023-01-11T22:57:22.7661304Z Entering 'third_party/python-enum' 2023-01-11T22:57:22.7703870Z Entering 'third_party/python-peachpy' 2023-01-11T22:57:22.7745698Z Entering 'third_party/python-six' 2023-01-11T22:57:22.7786981Z Entering 'third_party/sleef' 2023-01-11T22:57:22.7829732Z Entering 'third_party/tbb' 2023-01-11T22:57:22.7873980Z Entering 'third_party/tensorpipe' 2023-01-11T22:57:22.7918525Z Entering 'third_party/tensorpipe/third_party/googletest' 2023-01-11T22:57:22.7960480Z Entering 'third_party/tensorpipe/third_party/libnop' 2023-01-11T22:57:22.8001191Z Entering 'third_party/tensorpipe/third_party/libuv' 2023-01-11T22:57:22.8042986Z Entering 'third_party/tensorpipe/third_party/pybind11' 2023-01-11T22:57:22.8083345Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2023-01-11T22:57:22.8128043Z Entering 'third_party/zstd' 2023-01-11T22:57:22.8188080Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2023-01-11T22:57:22.8216435Z http.https://github.com/.extraheader 2023-01-11T22:57:22.8226741Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2023-01-11T22:57:22.8262877Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || : 2023-01-11T22:57:22.8573723Z Entering 'android/libs/fbjni' 2023-01-11T22:57:22.8598276Z http.https://github.com/.extraheader 2023-01-11T22:57:22.8631235Z Entering 'third_party/FP16' 2023-01-11T22:57:22.8655763Z http.https://github.com/.extraheader 2023-01-11T22:57:22.8689138Z Entering 'third_party/FXdiv' 2023-01-11T22:57:22.8714836Z http.https://github.com/.extraheader 2023-01-11T22:57:22.8747651Z Entering 'third_party/NNPACK' 2023-01-11T22:57:22.8771786Z http.https://github.com/.extraheader 2023-01-11T22:57:22.8804966Z Entering 'third_party/QNNPACK' 2023-01-11T22:57:22.8830444Z http.https://github.com/.extraheader 2023-01-11T22:57:22.8863561Z Entering 'third_party/VulkanMemoryAllocator' 2023-01-11T22:57:22.8887663Z http.https://github.com/.extraheader 2023-01-11T22:57:22.8920168Z Entering 'third_party/XNNPACK' 2023-01-11T22:57:22.8945284Z http.https://github.com/.extraheader 2023-01-11T22:57:22.8988554Z Entering 'third_party/benchmark' 2023-01-11T22:57:22.9012491Z http.https://github.com/.extraheader 2023-01-11T22:57:22.9044854Z Entering 'third_party/cpuinfo' 2023-01-11T22:57:22.9069578Z http.https://github.com/.extraheader 2023-01-11T22:57:22.9102228Z Entering 'third_party/cub' 2023-01-11T22:57:22.9126671Z http.https://github.com/.extraheader 2023-01-11T22:57:22.9159535Z Entering 'third_party/cudnn_frontend' 2023-01-11T22:57:22.9184565Z http.https://github.com/.extraheader 2023-01-11T22:57:22.9222888Z Entering 'third_party/cutlass' 2023-01-11T22:57:22.9247177Z http.https://github.com/.extraheader 2023-01-11T22:57:22.9286893Z Entering 'third_party/eigen' 2023-01-11T22:57:22.9311809Z http.https://github.com/.extraheader 2023-01-11T22:57:22.9346643Z Entering 'third_party/fbgemm' 2023-01-11T22:57:22.9370952Z http.https://github.com/.extraheader 2023-01-11T22:57:22.9403725Z Entering 'third_party/fbgemm/third_party/asmjit' 2023-01-11T22:57:22.9428574Z http.https://github.com/.extraheader 2023-01-11T22:57:22.9460968Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2023-01-11T22:57:22.9484851Z http.https://github.com/.extraheader 2023-01-11T22:57:22.9518607Z Entering 'third_party/fbgemm/third_party/googletest' 2023-01-11T22:57:22.9543605Z http.https://github.com/.extraheader 2023-01-11T22:57:22.9576232Z Entering 'third_party/fbgemm/third_party/hipify_torch' 2023-01-11T22:57:22.9601111Z http.https://github.com/.extraheader 2023-01-11T22:57:22.9634521Z Entering 'third_party/flatbuffers' 2023-01-11T22:57:22.9658972Z http.https://github.com/.extraheader 2023-01-11T22:57:22.9693195Z Entering 'third_party/fmt' 2023-01-11T22:57:22.9719022Z http.https://github.com/.extraheader 2023-01-11T22:57:22.9751433Z Entering 'third_party/foxi' 2023-01-11T22:57:22.9775357Z http.https://github.com/.extraheader 2023-01-11T22:57:22.9807259Z Entering 'third_party/gemmlowp/gemmlowp' 2023-01-11T22:57:22.9832040Z http.https://github.com/.extraheader 2023-01-11T22:57:22.9864692Z Entering 'third_party/gloo' 2023-01-11T22:57:22.9888894Z http.https://github.com/.extraheader 2023-01-11T22:57:22.9921962Z Entering 'third_party/googletest' 2023-01-11T22:57:22.9947315Z http.https://github.com/.extraheader 2023-01-11T22:57:22.9979859Z Entering 'third_party/ideep' 2023-01-11T22:57:23.0003961Z http.https://github.com/.extraheader 2023-01-11T22:57:23.0035326Z Entering 'third_party/ideep/mkl-dnn' 2023-01-11T22:57:23.0059867Z http.https://github.com/.extraheader 2023-01-11T22:57:23.0093850Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2023-01-11T22:57:23.0119423Z http.https://github.com/.extraheader 2023-01-11T22:57:23.0160220Z Entering 'third_party/ios-cmake' 2023-01-11T22:57:23.0184500Z http.https://github.com/.extraheader 2023-01-11T22:57:23.0216366Z Entering 'third_party/ittapi' 2023-01-11T22:57:23.0240824Z http.https://github.com/.extraheader 2023-01-11T22:57:23.0273075Z Entering 'third_party/kineto' 2023-01-11T22:57:23.0297385Z http.https://github.com/.extraheader 2023-01-11T22:57:23.0330075Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2023-01-11T22:57:23.0355239Z http.https://github.com/.extraheader 2023-01-11T22:57:23.0387523Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2023-01-11T22:57:23.0411348Z http.https://github.com/.extraheader 2023-01-11T22:57:23.0444862Z Entering 'third_party/nccl/nccl' 2023-01-11T22:57:23.0469600Z http.https://github.com/.extraheader 2023-01-11T22:57:23.0501876Z Entering 'third_party/neon2sse' 2023-01-11T22:57:23.0527216Z http.https://github.com/.extraheader 2023-01-11T22:57:23.0559297Z Entering 'third_party/nlohmann' 2023-01-11T22:57:23.0584454Z http.https://github.com/.extraheader 2023-01-11T22:57:23.0618079Z Entering 'third_party/onnx' 2023-01-11T22:57:23.0642466Z http.https://github.com/.extraheader 2023-01-11T22:57:23.0688540Z Entering 'third_party/onnx/third_party/benchmark' 2023-01-11T22:57:23.0713513Z http.https://github.com/.extraheader 2023-01-11T22:57:23.0746189Z Entering 'third_party/onnx/third_party/pybind11' 2023-01-11T22:57:23.0769974Z http.https://github.com/.extraheader 2023-01-11T22:57:23.0804934Z Entering 'third_party/onnx-tensorrt' 2023-01-11T22:57:23.0829654Z http.https://github.com/.extraheader 2023-01-11T22:57:23.0861233Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2023-01-11T22:57:23.0884659Z http.https://github.com/.extraheader 2023-01-11T22:57:23.0922704Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2023-01-11T22:57:23.0947868Z http.https://github.com/.extraheader 2023-01-11T22:57:23.0980862Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2023-01-11T22:57:23.1006249Z http.https://github.com/.extraheader 2023-01-11T22:57:23.1039756Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2023-01-11T22:57:23.1066152Z http.https://github.com/.extraheader 2023-01-11T22:57:23.1104842Z Entering 'third_party/pocketfft' 2023-01-11T22:57:23.1129515Z http.https://github.com/.extraheader 2023-01-11T22:57:23.1161283Z Entering 'third_party/protobuf' 2023-01-11T22:57:23.1186399Z http.https://github.com/.extraheader 2023-01-11T22:57:23.1223064Z Entering 'third_party/protobuf/third_party/benchmark' 2023-01-11T22:57:23.1247498Z http.https://github.com/.extraheader 2023-01-11T22:57:23.1280208Z Entering 'third_party/protobuf/third_party/googletest' 2023-01-11T22:57:23.1304573Z http.https://github.com/.extraheader 2023-01-11T22:57:23.1339103Z Entering 'third_party/psimd' 2023-01-11T22:57:23.1364776Z http.https://github.com/.extraheader 2023-01-11T22:57:23.1397017Z Entering 'third_party/pthreadpool' 2023-01-11T22:57:23.1421778Z http.https://github.com/.extraheader 2023-01-11T22:57:23.1453375Z Entering 'third_party/pybind11' 2023-01-11T22:57:23.1478290Z http.https://github.com/.extraheader 2023-01-11T22:57:23.1510795Z Entering 'third_party/python-enum' 2023-01-11T22:57:23.1535404Z http.https://github.com/.extraheader 2023-01-11T22:57:23.1567135Z Entering 'third_party/python-peachpy' 2023-01-11T22:57:23.1591640Z http.https://github.com/.extraheader 2023-01-11T22:57:23.1624487Z Entering 'third_party/python-six' 2023-01-11T22:57:23.1649106Z http.https://github.com/.extraheader 2023-01-11T22:57:23.1680991Z Entering 'third_party/sleef' 2023-01-11T22:57:23.1705707Z http.https://github.com/.extraheader 2023-01-11T22:57:23.1738660Z Entering 'third_party/tbb' 2023-01-11T22:57:23.1763052Z http.https://github.com/.extraheader 2023-01-11T22:57:23.1797374Z Entering 'third_party/tensorpipe' 2023-01-11T22:57:23.1822320Z http.https://github.com/.extraheader 2023-01-11T22:57:23.1854331Z Entering 'third_party/tensorpipe/third_party/googletest' 2023-01-11T22:57:23.1878957Z http.https://github.com/.extraheader 2023-01-11T22:57:23.1911586Z Entering 'third_party/tensorpipe/third_party/libnop' 2023-01-11T22:57:23.1935396Z http.https://github.com/.extraheader 2023-01-11T22:57:23.1966927Z Entering 'third_party/tensorpipe/third_party/libuv' 2023-01-11T22:57:23.1992443Z http.https://github.com/.extraheader 2023-01-11T22:57:23.2025767Z Entering 'third_party/tensorpipe/third_party/pybind11' 2023-01-11T22:57:23.2049474Z http.https://github.com/.extraheader 2023-01-11T22:57:23.2081250Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2023-01-11T22:57:23.2106183Z http.https://github.com/.extraheader 2023-01-11T22:57:23.2141636Z Entering 'third_party/zstd' 2023-01-11T22:57:23.2166100Z http.https://github.com/.extraheader 2023-01-11T22:57:23.2470910Z Cleaning up orphan processes